
Thanks for the feedback. My responses are inline below. -eric On Wed, Feb 16, 2022 at 6:36 AM Petr Viktorin <encukou@gmail.com> wrote:
Thank you very much for writing this down! It's very helpful to see a concrete proposal, and the current state of this idea. I like the change,
That's good to hear. :)
but I think it's unfortunately more complicated than the PEP suggests.
That would be unsurprising. :)
This proposal is CPython-specific and, effectively, describes internal implementation details.
I think that is a naïve statement. Refcounting is implementation-specific, but it's hardly an *internal* detail.
Sorry for any confusion. I didn't mean to say that refcounting is an internal detail. Rather, I was talking about how the proposed change in refcounting behavior doesn't affect any guaranteed/documented behavior, hence "internal". Perhaps I missed some documented behavior? I was going off the following: * https://docs.python.org/3.11/c-api/intro.html#objects-types-and-reference-co... * https://docs.python.org/3.11/c-api/structures.html#c.Py_REFCNT
There is code that targets CPython specifically, and relies on the details.
Could you elaborate? Do you mean such code relies on specific refcount values?
The refcount has public getters and setters,
Agreed. However, what behavior do users expect and what guarantees do we make? Do we indicate how to interpret the refcount value they receive? What are the use cases under which a user would set an object's refcount to a specific value? Are users setting the refcount of objects they did not create?
and you need a pretty good grasp of the concept to write a C extension.
I would not expect this to be affected by this PEP, except in cases where users are checking/modifying refcounts for objects they did not create (since none of their objects will be immortal).
I think that it's safe to assume that this will break people's code,
Do you have some use case in mind, or an example? From my perspective I'm having a hard time seeing what this proposed change would break. That said, Kevin Modzelewski indicated [1] that there were affected cases for Pyston (though their change in behavior is slightly different). [1] https://mail.python.org/archives/list/python-dev@python.org/message/TPLEYDCX...
and this PEP should convince us that the breakage is worth it rather than dismiss the issue.
Sorry, I didn't mean to be dismissive. I agree that if there is breakage this PEP must address it.
It would be good to note that “container” refers to the GC term, as in https://devguide.python.org/garbage_collector/#identifying-reference-cycles
and not e.g. https://docs.python.org/3/library/collections.abc.html#collections.abc.Conta...
+1
This has a concrete impact on active projects in the Python community. Below we describe several ways in which refcount modification has a real negative effect on those projects. None of that would happen for objects that are truly immutable.
Reducing Cache Invalidation ---------------------------
Explicitly saying “CPU cache” would make the PEP easier to skim.
+1
Every modification of a refcount causes the corresponding cache line to be invalidated. This has a number of effects.
For one, the write must be propagated to other cache levels and to main memory. This has small effect on all Python programs. Immortal objects would provide a slight relief in that regard.
On top of that, multi-core applications pay a price. If two threads are interacting with the same object (e.g. ``None``) then they will end up invalidating each other's caches with each incref and decref. This is true even for otherwise immutable objects like ``True``, ``0``, and ``str`` instances. This is also true even with the GIL, though the impact is smaller.
This looks out of context. Python has a per-process GIL. It should it go after the next section.
This isn't about a data race. I'm talking about how if an object is active in two different threads (on distinct cores) then incref/decref in one thread will invalidate the cache (line) in the other thread. The only impact of the GIL in this case is that the two threads aren't running simultaneously and the cache invalidation on the idle thread has less impact. Perhaps I've missed something?
The proposed solution is obvious enough that two people came to the same conclusion (and implementation, more or less) independently.
Who was it? Assuming it's not a secret :)
Me and Eddit. :) I don't mind saying so.
In the case of per-interpreter GIL, the only realistic alternative is to move all global objects into ``PyInterpreterState`` and add one or more lookup functions to access them. Then we'd have to add some hacks to the C-API to preserve compatibility for the may objects exposed there. The story is much, much simpler with immortal objects
Weren't you planning a PEP on subinterpreter GIL as well? Do you want to submit them together?
I'd have to think about that. The other PEP I'm writing for per-interpreter GIL doesn't require immortal objects. They just simplify a number of things. That's my motivation for writing this PEP, in fact. :)
IMO, as it is, the PEP's motivation doesn't really stand on its own. It's only worth it as a step towards per-interpreter GIL.
I expect Eddie would argue otherwise, but I probably wouldn't have written this PEP if it weren't for its benefit to per-interpreter GIL.
(We might have a catch-24 situation in the way we currently handle PEPs. That would mean we need to change the process, maybe even permanently. IMO, this PEP will be very helpful in untangling the situation.)
Glad to help. :)
Backward Compatibility -----------------------
This proposal is completely compatible. It is internal-only so no API is changing.
The approach is also compatible with extensions compiled to the stable ABI. Unfortunately, they will modify the refcount and invalidate all the performance benefits of immortal objects. However, the high bit of the refcount will still match ``_Py_IMMORTAL_REFCNT`` so we can still identify such objects as immortal.
So, any extension that uses the stable ABI will break an invariant. What'll be the impact? The total refcount will probably go out of sync, anything else?
The impact would be: objects incref/decref'ed by such a module would be exposed to some of the performance penalties described earlier in the PEP. I expect the potential aggregate cost would be relatively small.
If an extension DECREFs an immortal object, will it still match _Py_IMMORTAL_REFCNT? How is that guaranteed?
It wouldn't match _Py_IMMORTAL_REFCNT, but the high bit of _Py_IMMORTAL_REFCNT would still match. That bit is what we would actually be checking, rather than the full value.
What about extensions compiled with Python 3.11 (with this PEP) that use an older version of the stable ABI, and thus should be compatible with 3.2+? Will they use the old versions of the macros? How will that be tested?
It wouldn't matter unless an object's refcount reached _Py_IMMORTAL_REFCNT, at which point incref/decref would start noop'ing. What is the likelihood (in real code) that an object's refcount would grow that far? Even then, would such an object ever be expected to go back to 0 (and be dealloc'ed)? Otherwise the point is moot.
No user-facing behavior changes, with the following exceptions:
* code that inspects the refcount (e.g. ``sys.getrefcount()`` or directly via ``ob_refcnt``) will see a really, really large value * ``Py_SET_REFCNT()`` will be a no-op for immortal objects
Neither should cause a problem.
Alternate Python Implementations --------------------------------
This proposal is CPython-specific.
IMO it's specific to the C API, which is wider than just CPython. I don't think we can just assume it'll have no impact on other implementations.
Fair enough.
Non-Obvious Consequences ------------------------
* immortal containers effectively immortalize each contained item * the same is true for objects held internally by other objects (e.g. ``PyTypeObject.tp_subclasses``)
So, do immortal lists immortalize values append()ed to them? (Can you even have an immortal list? Are there limits on what can be immortal?)
We have no plans to do more than ever explicitly immortalize objects. So an immortal list is fine but it would have no effect on the immortality of items it contains, other than implicitly (since the list holds a reference to each item). In general, it would be best to only immortalize immutable objects. If we want to share any objects shared between threads without protection (e.g. per-interpreter GIL) then such objects must be immortal and immutable. So lists and dicts, etc. couldn't be shared (assuming we can't prevent further mutation). However, for objects that will never be shared, it can be practical to make some of them immortal too. For example, sys.modules is a per-interpreter dict that we do not expect to ever get freed until the corresponding interpreter is finalized. By making it immortal, we no longer incur the extra overhead during incref/decref. We can apply this idea in the pursuit of getting back some of that 4% performance we lost. At the end of runtime init we can mark *all* objects as immortal and avoid the extra cost in incref/decref. We only need to worry about immutability with objects that we plan on sharing between threads without a GIL. (FYI, we still need to look closely at the impact of this approach on GC.)
* an immortal object's type is effectively immortal
Should this be enforced?
There is nothing to enforce. The object holds a reference to its type so the type will never be cleaned up as long as the immortal object isn't. Hence the type of an immortal object is effectively immortal. We don't need the type to actually be marked as immortal.
* though extremely unlikely (and technically hard), any object could be incref'ed enough to reach ``_Py_IMMORTAL_REFCNT`` and then be treated as immortal What would it take?
Basically, you;d have to do it deliberately (e.g. incref the object in a tight loop). Even with a tight loop it would take a long time to count up to 2^60 or whatever the chosen value is.
This is not meant to be a public feature but rather an internal one. So the proposal does *not* including adding any new public C-API, nor any Python API. However, this does not prevent us from adding (publicly accessible) private API to do things like immortalize an object or tell if one is immortal.
This is a public change.
I agree that the change to the implementation of some public API is certainly public, as is the change in behavior for immortal objects, as is the potential <4% performance regression. By "public feature" I was referring to immortal objects. We are not exposing that to users, other than that they might notice some objects now have a really high refcount that does not change.
Py_INCREF increments the reference count. Py_REFCNT gets the reference count. For immortal objects, Py_INCREF will no longer function as documented in 3.10, and Py_REFCNT can be used to witness it. Both are public API.
You are right that "Increment the reference count for object o." (as documented) will not be true for an immortal object. Instead it would be something like "indicate that there is an additional reference for object o". I'll be sure to update the PEP, to add that change to the docs wording. Regardless, how important is that distinction? If it matters then clearly this proposal needs to change. As an exercise, we can consider one of the most used objects, None, and that we would make it immortal. How would that impact users of Py_INCREF() and Py_REFCNT()?
Immortal Global Objects [snip] There will likely be others we have not enumerated here.
How will the candidates be chosen?
Any objects that we would expect to share globally (ergo otherwise immutable) will be made immortal. That means the static types, the builtin singletons, the objects in _PyRuntimeState.global_objects, etc.
Making a mutable object immortal isn't particularly helpful. The exception is if you can ensure the object isn't actually modified again. Since we aren't enforcing any immutability for immortal objects it didn't make sense to emphasis that relationship.
Is it just not helpful, or is it disallowed?
It is not disallowed. Also, I need to clarify that section since there are cases where making a mutable object immortal can provide performance benefits, as described earlier.
What about __subclasses__/tp_subclasses?
That's one we'll have to deal with specially, e.g. for core static types we'd store the object on PyInterpreterState. Then the __subclasses__ getter would do a lookup on the current interpreter, instead of using tp_subclasses. We could get rid of tp_subclasses or perhaps use it only for the main interpreter.