On 2019-03-03, Neil Schemenauer wrote:
What is not clear to me yet is how difficult it would be for other Python VMs to implement a handle API that gives that [one-to-one] invariant.
Some more details on why I think this could be hard to implement. Maybe I'm missing a fast way to do it.
In the case that the VM moves objects, something like handles are necessary. Otherwise, when the GC moves an object, there is no way for it to update the pointers it has given out via the API. Handles fix that with another level of indirection.
To do handles without the one-to-one invariant is easy and fast. You can have a table of handles (lookup is O(1) based on the handle value) and you allocate by just using the next free value in the table. You can have a free-list like malloc uses if you want to fill holes. The Android indirect_reference code I linked is a fancier version that allowed the local reference behavior (stack-like deallocation of handles).
If you want the handles to correspond one-to-one with the managed objects, how can you do that? An obvious approach would be to use a hash table or simimlar O(1) data structure based on the pointer address of the managed object. I.e. if you need a new handle, lookup in the table if there is one already and then use that.
A moving GC has no problem to update the indirect references in the handle table. Just treat it as another set of GC roots. However, since the managed object is moving, you can no longer use the pointer address to lookup the handle (e.g. the hash value has changed). When moving, you could remove it and then add it back after the move. That would make the GC process a lot slower though.
You could give every managed object an ID field. Bad news is that you have doubled the storage size of small objects like floats and fixed ints (assuming the VM can store them unboxed). That could be used to make the Python id() function return a stable value. It seems one-to-one handles could use whatever solution the VM uses to implement id(). In the PyPy docs, they say this about id():
https://pypy.readthedocs.io/en/latest/cpython_differences.html
Using the default GC (called minimark), the built-in function
id() works like it does in CPython. With other GCs it returns
numbers that are not real addresses (because an object can move
around several times) and calling it a lot can lead to
performance problem.
Regards,
Neil