On 2019-02-28, Carl Shapiro wrote:
It is possible, today, to treat PyObject* as an opaque handle if you do not stray far from the limited API.
Is this an argument for not introducing a PyHandle data type? Do you think it is better just to make PyObject work as handles?
If PyHandles are not mapped one-to-one to an object, identity comparisons will need to go through a function call. Furthermore, a compatibility scheme such as converting a PyHandle to a PyObject* would be more complicated than a simple wrapping of the PyHandle as the resulting PyObject* would not be identity equal to any other PyObject* referring to the same object.
Good point. I was thinking implicitly that PyHandles would not be mapped one-to-one. However, if CPython makes PyHandle just a type cast from PyObject, people are going to do pointer compares and then be surprised that their extension breaks with other runtimes.
There are a lot of design considerations and experience with handles in other languages that can inform a design for CPython. For example, references in Java's JNI are most commonly implemented as a handle that indirectly references an object.
Thank you for bringing up JNI. I was vaguely aware of it but after doing some reading last night, I see it solves many of the same problems we are trying to solve.
As such, a user of JNI must be careful to compare references using the IsSameObject predicate instead of an ordinary == compare in C . Despite JNI being >20 years old, this remains counterintuitive and is common source of bugs as you can infer from this Android SDK guide
So, what's your opinion on that choice? Not requiring a one-to-one mapping for the handles makes things easier for the runtime but the API is harder to use correctly. Should we follow the JNI model or should we pay the cost to get one-to-one mapping?
If the runtime also pays the memory cost to keep a reference count in the handle table, I think I see how we could just make PyObject be the opaque handle.
A good implementation of JNI will avoid using stack addresses or dense integers as a handle value because it is too hard to ensure those values are not stale and do not alias to something that shouldn’t belong to you.
Interesting. If you give up on binary compatibility,you could have a debug build option that enables encryption of handle values. Disable that for better performance in release builds. Maybe that's poor software engineering though (like no having array bounds checking turned on by default).
Because of all of the accumulated experience with handles in other systems, I think CPython is positioned to do much better than its predecessors.
We better study those systems then. I think it would be best to not be too creative and stick to a design that has been proven to work. JNI looks to be a goldmine of ideas (not that we have to make all the same decisions).
Do you have suggestions for other native interfaces that should be studied? It looks like CoreCLR has something but it seems to require using C++. At least, the exception handling uses C++ features. E.g.
EX_TRY / EX_CATCH / EX_END_CATCH
https://github.com/dotnet/coreclr/blaob/master/Documentation/botr/exceptions.md
Possible other sources of ideas: Common Lisp, LuaJIT, Smalltalk implementations, Erlang, Haskel implementations. Compared to those, I would suspect JNI is much more heavily used in practice.
Regards,
Neil