Question about ref counting and non-movable objects
Hi, I try to write a text about what I understood on problems related to the CPython C API, CPython performance, the scientific Python ecosystem and HPy (see https://discuss.python.org/t/c-api-working-group-and-plan-to-get-a-python-c-... and recent mails in this list). I have a specific question about the effect of ref counting and non-movable objects for interpreters of dynamic languages like Python. Since all fast Python interpreters (and many interpreters for other similar languages) use more complex alternatives, I guess this would be very difficult to get good performance with such methods. However, I don't really know why and if it could be possible to get good performance with such algorithms. Interestingly, the Faster CPython project tries to make CPython faster while keeping ref counting and non-movable objects. I guess people knowing the subject (for example PyPy or GraalPy devs) could give good arguments on these questions or provide interesting and serious references. Best regards, Pierre
Hi Pierre, Thanks for asking such questions and for driving the conversation on Python's forum.
I have a specific question about the effect of ref counting and non-movable objects for interpreters of dynamic languages like Python.
Since all fast Python interpreters (and many interpreters for other similar languages) use more complex alternatives, I guess this would be very difficult to get good performance with such methods.
However, I don't really know why and if it could be possible to get good performance with such algorithms.
Others here have much deeper knowledge of these topics, but let me make a few comments in the hopes of promoting discussion: I doubt that techniques like ref-counting or non-movable objects are by themselves provably slow, but they do create difficulties: Non-movable objects: Why is the object non-movable? Because somewhere there is a pointer to the memory and the run-time doesn't have information on where the pointer is or what it is used for. This means that not only is the memory not movable, but its internal bytes are also readable by something, and if that memory represents something like a PyLong, then you also can't unbox that memory, etc. Keeping track of which pieces of memory are visible where is a prerequisite for a host of performance improvements. Reference counting: The attraction of reference counting and its downfall are in the "counting" part. A reference count records *that* a reference is being used but not *where* it is being used. So code can borrow a reference as long as it can prove something else keeps that reference alive. You can't know whether the reference is being used in some C extension, or another thread, or being written to or only read, etc. This also removes a host of possible performance improvements. Indeed in the Python free-threading, extending reference counting to hold more information about which thread an object is being accessed from was a key enhancement. Regards, Simon
participants (2)
-
PIERRE AUGIER -
Simon Cross