On Thu, Feb 28, 2019 at 11:11 AM Neil Schemenauer <nas-python@arctrix.com> wrote:
Implementing this handle layer for CPython should be quite easy. PyObject_FromRef can just be a typecast from pyref* to PyObject*. No extra piece of memory needs to be allocated because the pyref object already has space for the reference count. In debug builds, CPython should check that the handle API is used correct (e.g. add a field to keep track that handles are properly closed).
It is possible, today, to treat PyObject* as an opaque handle if you do not stray far from the limited API. (PyPy is less restricted than that.) In my experience, this kind of handle can be a pair of a pointer to an object and a reference count and a PyObject* points to that pair. These pairs would be stored together with other handles in a dense array, something that is easy to allocate from and for the garbage collector to visit. The reference count field does add a word of overhead but that is offset by not storing reference counting metadata in the rest of your heap objects.
An interesting property of PyObject* relative to your proposal here is that PyObject* is a direct pointer to an object. This means code expects to be able to compare a PyObject* for identity equality using == as one would do for any other object in C. To ensure that every PyObject* has this property, a mapping from an object to its unique handle must be done when passing it between Python and C. Different implementation techniques for this mapping will make the lookup faster or slower.
This relates to an interesting consequence of something like PyHandle. If PyHandles are not mapped one-to-one to an object, identity comparisons will need to go through a function call. Furthermore, a compatibility scheme such as converting a PyHandle to a PyObject* would be more complicated than a simple wrapping of the PyHandle as the resulting PyObject* would not be identity equal to any other PyObject* referring to the same object.
There are a lot of design considerations and experience with handles in other languages that can inform a design for CPython. For example, references in Java's JNI are most commonly implemented as a handle that indirectly references an object. As such, a user of JNI must be careful to compare references using the IsSameObject predicate instead of an ordinary == compare in C . Despite JNI being >20 years old, this remains counterintuitive and is common source of bugs as you can infer from this Android SDK guide
https://developer.android.com/training/articles/perf-jni
Another lesson we can learn from JNI is that all of the bugs associated with file descriptors apply to handles. Because references to things in memory are more common than file descriptors, these bugs become a lot more commonly occurring. A good implementation of JNI will avoid using stack addresses or dense integers as a handle value because it is too hard to ensure those values are not stale and do not alias to something that shouldn’t belong to you. Therefore, a good implementation typically avoids recycling references and obfuscates their values using some form of encryption. This adds to the overhead of using a reference and the complexity of implementing JNI.
Because of all of the accumulated experience with handles in other systems, I think CPython is positioned to do much better than its predecessors. Having a PyHandle prototype as a third-party extension for experimentation purposes will go even further to help avoid making subtle mistakes that affect developers for decades to come.