PEP 683: "Immortal Objects, Using a Fixed Refcount"
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
Eddie and I would appreciate your feedback on this proposal to support treating some objects as "immortal". The fundamental characteristic of the approach is that we would provide stronger guarantees about immutability for some objects. A few things to note: * this is essentially an internal-only change: there are no user-facing changes (aside from affecting any 3rd party code that directly relies on specific refcounts) * the naive implementation shows a 4% slowdown * we have a number of strategies that should reduce that penalty * without immortal objects, the implementation for per-interpreter GIL will require a number of non-trivial workarounds That last one is particularly meaningful to me since it means we would definitely miss the 3.11 feature freeze. With immortal objects, 3.11 would still be in reach. -eric ----------------------- PEP: 683 Title: Immortal Objects, Using a Fixed Refcount Author: Eric Snow <ericsnowcurrently@gmail.com>, Eddie Elizondo <eduardo.elizondorueda@gmail.com> Discussions-To: python-dev@python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 10-Feb-2022 Python-Version: 3.11 Post-History: Resolution: Abstract ======== Under this proposal, any object may be marked as immortal. "Immortal" means the object will never be cleaned up (at least until runtime finalization). Specifically, the `refcount`_ for an immortal object is set to a sentinel value, and that refcount is never changed by ``Py_INCREF()``, ``Py_DECREF()``, or ``Py_SET_REFCNT()``. For immortal containers, the ``PyGC_Head`` is never changed by the garbage collector. Avoiding changes to the refcount is an essential part of this proposal. For what we call "immutable" objects, it makes them truly immutable. As described further below, this allows us to avoid performance penalties in scenarios that would otherwise be prohibitive. This proposal is CPython-specific and, effectively, describes internal implementation details. .. _refcount: https://docs.python.org/3.11/c-api/intro.html#reference-counts Motivation ========== Without immortal objects, all objects are effectively mutable. That includes "immutable" objects like ``None`` and ``str`` instances. This is because every object's refcount is frequently modified as it is used during execution. In addition, for containers the runtime may modify the object's ``PyGC_Head``. These runtime-internal state currently prevent full immutability. This has a concrete impact on active projects in the Python community. Below we describe several ways in which refcount modification has a real negative effect on those projects. None of that would happen for objects that are truly immutable. Reducing Cache Invalidation --------------------------- Every modification of a refcount causes the corresponding cache line to be invalidated. This has a number of effects. For one, the write must be propagated to other cache levels and to main memory. This has small effect on all Python programs. Immortal objects would provide a slight relief in that regard. On top of that, multi-core applications pay a price. If two threads are interacting with the same object (e.g. ``None``) then they will end up invalidating each other's caches with each incref and decref. This is true even for otherwise immutable objects like ``True``, ``0``, and ``str`` instances. This is also true even with the GIL, though the impact is smaller. Avoiding Data Races ------------------- Speaking of multi-core, we are considering making the GIL a per-interpreter lock, which would enable true multi-core parallelism. Among other things, the GIL currently protects against races between multiple threads that concurrently incref or decref. Without a shared GIL, two running interpreters could not safely share any objects, even otherwise immutable ones like ``None``. This means that, to have a per-interpreter GIL, each interpreter must have its own copy of *every* object, including the singletons and static types. We have a viable strategy for that but it will require a meaningful amount of extra effort and extra complexity. The alternative is to ensure that all shared objects are truly immutable. There would be no races because there would be no modification. This is something that the immortality proposed here would enable for otherwise immutable objects. With immortal objects, support for a per-interpreter GIL becomes much simpler. Avoiding Copy-on-Write ---------------------- For some applications it makes sense to get the application into a desired initial state and then fork the process for each worker. This can result in a large performance improvement, especially memory usage. Several enterprise Python users (e.g. Instagram, YouTube) have taken advantage of this. However, the above refcount semantics drastically reduce the benefits and has led to some sub-optimal workarounds. Also note that "fork" isn't the only operating system mechanism that uses copy-on-write semantics. Rationale ========= The proposed solution is obvious enough that two people came to the same conclusion (and implementation, more or less) independently. Other designs were also considered. Several possibilities have also been discussed on python-dev in past years. Alternatives include: * use a high bit to mark "immortal" but do not change ``Py_INCREF()`` * add an explicit flag to objects * implement via the type (``tp_dealloc()`` is a no-op) * track via the object's type object * track with a separate table Each of the above makes objects immortal, but none of them address the performance penalties from refcount modification described above. In the case of per-interpreter GIL, the only realistic alternative is to move all global objects into ``PyInterpreterState`` and add one or more lookup functions to access them. Then we'd have to add some hacks to the C-API to preserve compatibility for the may objects exposed there. The story is much, much simpler with immortal objects Impact ====== Benefits -------- Most notably, the cases described in the two examples above stand to benefit greatly from immortal objects. Projects using pre-fork can drop their workarounds. For the per-interpreter GIL project, immortal objects greatly simplifies the solution for existing static types, as well as objects exposed by the public C-API. In general, a strong immutability guarantee for objects enables Python applications to scale like never before. This is because they can then leverage multi-core parallelism without a tradeoff in memory usage. This is reflected in most of the above cases. Performance ----------- A naive implementation shows `a 4% slowdown`_. Several promising mitigation strategies will be pursued in the effort to bring it closer to performance-neutral. On the positive side, immortal objects save a significant amount of memory when used with a pre-fork model. Also, immortal objects provide opportunities for specialization in the eval loop that would improve performance. .. _a 4% slowdown: https://github.com/python/cpython/pull/19474#issuecomment-1032944709 Backward Compatibility ----------------------- This proposal is completely compatible. It is internal-only so no API is changing. The approach is also compatible with extensions compiled to the stable ABI. Unfortunately, they will modify the refcount and invalidate all the performance benefits of immortal objects. However, the high bit of the refcount will still match ``_Py_IMMORTAL_REFCNT`` so we can still identify such objects as immortal. No user-facing behavior changes, with the following exceptions: * code that inspects the refcount (e.g. ``sys.getrefcount()`` or directly via ``ob_refcnt``) will see a really, really large value * ``Py_SET_REFCNT()`` will be a no-op for immortal objects Neither should cause a problem. Alternate Python Implementations -------------------------------- This proposal is CPython-specific. Security Implications --------------------- This feature has no known impact on security. Maintainability --------------- This is not a complex feature so it should not cause much mental overhead for maintainers. The basic implementation doesn't touch much code so it should have much impact on maintainability. There may be some extra complexity due to performance penalty mitigation. However, that should be limited to where we immortalize all objects post-init and that code will be in one place. Non-Obvious Consequences ------------------------ * immortal containers effectively immortalize each contained item * the same is true for objects held internally by other objects (e.g. ``PyTypeObject.tp_subclasses``) * an immortal object's type is effectively immortal * though extremely unlikely (and technically hard), any object could be incref'ed enough to reach ``_Py_IMMORTAL_REFCNT`` and then be treated as immortal Specification ============= The approach involves these fundamental changes: * add ``_Py_IMMORTAL_REFCNT`` (the magic value) to the internal C-API * update ``Py_INCREF()`` and ``Py_DECREF()`` to no-op for objects with the magic refcount (or its most significant bit) * do the same for any other API that modifies the refcount * stop modifying ``PyGC_Head`` for immortal containers * ensure that all immortal objects are cleaned up during runtime finalization Then setting any object's refcount to ``_Py_IMMORTAL_REFCNT`` makes it immortal. To be clear, we will likely use the most-significant bit of ``_Py_IMMORTAL_REFCNT`` to tell if an object is immortal, rather than comparing with ``_Py_IMMORTAL_REFCNT`` directly. (There are other minor, internal changes which are not described here.) This is not meant to be a public feature but rather an internal one. So the proposal does *not* including adding any new public C-API, nor any Python API. However, this does not prevent us from adding (publicly accessible) private API to do things like immortalize an object or tell if one is immortal. Affected API ------------ API that will now ignore immortal objects: * (public) ``Py_INCREF()`` * (public) ``Py_DECREF()`` * (public) ``Py_SET_REFCNT()`` * (private) ``_Py_NewReference()`` API that exposes refcounts (unchanged but may now return large values): * (public) ``Py_REFCNT()`` * (public) ``sys.getrefcount()`` (Note that ``_Py_RefTotal`` and ``sys.gettotalrefcount()`` will not be affected.) Immortal Global Objects ----------------------- The following objects will be made immortal: * singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``) * all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``) * all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers, small ints) There will likely be others we have not enumerated here. Object Cleanup -------------- In order to clean up all immortal objects during runtime finalization, we must keep track of them. For container objects we'll leverage the GC's permanent generation by pushing all immortalized containers there. During runtime shutdown, the strategy will be to first let the runtime try to do its best effort of deallocating these instances normally. Most of the module deallocation will now be handled by pylifecycle.c:finalize_modules which cleans up the remaining modules as best as we can. It will change which modules are available during __del__ but that's already defined as undefined behavior by the docs. Optionally, we could do some topological disorder to guarantee that user modules will be deallocated first before the stdlib modules. Finally, anything leftover (if any) can be found through the permanent generation gc list which we can clear after finalize_modules. For non-container objects, the tracking approach will vary on a case-by-case basis. In nearly every case, each such object is directly accessible on the runtime state, e.g. in a ``_PyRuntimeState`` or ``PyInterpreterState`` field. We may need to add a tracking mechanism to the runtime state for a small number of objects. Documentation ------------- The feature itself is internal and will not be added to the documentation. We *may* add a note about immortal objects to the following, to help reduce any surprise users may have with the change: * ``Py_SET_REFCNT()`` (a no-op for immortal objects) * ``Py_REFCNT()`` (value may be surprisingly large) * ``sys.getrefcount()`` (value may be surprisingly large) Other API that might benefit from such notes are currently undocumented. We wouldn't add a note anywhere else (including for ``Py_INCREF()`` and ``Py_DECREF()``) since the feature is otherwise transparent to users. Rejected Ideas ============== Equate Immortal with Immutable ------------------------------ Making a mutable object immortal isn't particularly helpful. The exception is if you can ensure the object isn't actually modified again. Since we aren't enforcing any immutability for immortal objects it didn't make sense to emphasis that relationship. Reference Implementation ======================== The implementation is proposed on GitHub: https://github.com/python/cpython/pull/19474 Open Issues =========== * is there any other impact on GC? References ========== This was discussed in December 2021 on python-dev: * https://mail.python.org/archives/list/python-dev@python.org/thread/7O3FUA52Q... * https://mail.python.org/archives/list/python-dev@python.org/thread/PNLBJBNIQ... Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
![](https://secure.gravatar.com/avatar/351a10f392414345ed67a05e986dc4dd.jpg?s=120&d=mm&r=g)
+1 for overall idea. Some comments:
Also note that "fork" isn't the only operating system mechanism that uses copy-on-write semantics.
Could you elaborate? mmap, maybe? Generally speaking, fork is very difficult to use in safe. My company's web apps load applications and libraries *after* fork, not *before* fork for safety. We had changed multiprocessing to use spawn by default on macOS. So I don't recommend many Python users to use fork. So if you know how to get benefit from CoW without fork, I want to know it.
How about interned strings? Should the intern dict be belonging to runtime, or (sub)interpreter? If the interned dict is belonging to runtime, all interned dict should be immortal to be shared between subinterpreters. If the interned dict is belonging to interpreter, should we register immortalized string to all interpreters? Regards, -- Inada Naoki <songofacandy@gmail.com>
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 12:37 AM Inada Naoki <songofacandy@gmail.com> wrote:
+1 for overall idea.
Great!
Sorry if I got your hopes up. Yeah, I was talking about mmap.
There will likely be others we have not enumerated here.
How about interned strings?
Marking every interned string as immortal may make sense.
Excellent questions. Making immutable objects immortal is relatively simple. For the most part, mutable objects should not be shared between interpreters without protection (e.g. the GIL). The interned dict isn't exposed to Python code or the C-API, so there's less risk, but it still wouldn't work without cleverness. So it should be per-interpreter. It would be nice if it were global though. :)
If the interned dict is belonging to interpreter, should we register immortalized string to all interpreters?
That's a good point. It may be worth doing something like that. -eric
![](https://secure.gravatar.com/avatar/351a10f392414345ed67a05e986dc4dd.jpg?s=120&d=mm&r=g)
On Thu, Feb 17, 2022 at 7:01 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
Is there any common tool that utilize CoW by mmap? If you know, please its link to the PEP. If there is no common tool, most Python users can get benefit from this. Generally speaking, fork is a legacy API. It is too difficult to know which library is fork-safe, even for stdlibs. And Windows users can not use fork. Optimizing for non-fork use case is much better than optimizing for fork use cases. * https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234 * https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.p... * https://www.evanjones.ca/fork-is-dangerous.html * https://bugs.python.org/issue33725 I hope per-interpreter GIL replaces fork use cases. But tools using CoW without fork also welcome, especially if it supports Windows. Anyway, I don't believe stopping refcounting will fix the CoW issue yet. See this article [1] again. [1] https://instagram-engineering.com/dismissing-python-garbage-collection-at-in... Note that they failed to fix CoW by stopping refcounting code objects! (*) Most CoW was caused by cyclic GC and finalization caused most CoW. (*) It is not surprising to me because eval loop don't incre/decref most code attributes. They borrow reference from the code object. So we need a sample application and profile it, before saying it fixes CoW. Could you provide some data, or drop the CoW issue from this PEP until it is proved? Regards, -- Inada Naoki <songofacandy@gmail.com>
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 8:45 PM Inada Naoki <songofacandy@gmail.com> wrote:
Sorry, I'm not aware of any, but I also haven't researched the topic much. Regardless, that would be a good line of inquiry. A reference like that would probably help make the PEP a bit more justifiable without per-interpreter GIL. :)
+1
I hope per-interpreter GIL replaces fork use cases.
Yeah, that's definitely one big benefit.
But tools using CoW without fork also welcome, especially if it supports Windows.
+1
That's definitely an important point, given that the main objective of the proposal is to allow disabling mutation of runtime-internal object state so that some objects can be made truly immutable. I'm sure Eddie has some good insight on the matter (and may have even been involved in writing that article). Eddie?
Note that they failed to fix CoW by stopping refcounting code objects! (*) Most CoW was caused by cyclic GC and finalization caused most CoW.
That's a good observation!
(*) It is not surprising to me because eval loop don't incre/decref most code attributes. They borrow reference from the code object.
+1
We'll look into that. -eric
![](https://secure.gravatar.com/avatar/5f0c59c0ed4548113ea5552830986c69.jpg?s=120&d=mm&r=g)
Hey Inada, thanks for the feedback
Generally speaking, fork is a legacy API. It is too difficult to know which library is fork-safe, even for stdlibs.
Yes, this is something that Instagram has to go into great lengths to make sure that we get the entire execution into a state where it's safe to fork. It works, but it's hard to maintain. We'd rather have a simpler model!
I hope per-interpreter GIL replaces fork use cases.
We hope so too, hence the big push towards having immutable shared state across the interpreters. For large applications like Instagram, this is a must, otherwise copying state into every interpreter would be too costly.
Anyway, I don't believe stopping refcounting will fix the CoW issue yet. See this article [1] again.
That article is five years old so it doesn't reflect the current state of the system! We have continuous profiling and monitoring of Copy on Writes and after introducing the techniques described in this PEP, we have largely fixed the majority of scenarios where this happens. You are right in the fact that just addressing reference counting will not fix all CoW issues. The trick here is also to leverage the permanent GC generation used for the `gc.freeze` API. That is, if you have a container that it's known to be immortal, it should be pushed into the permanent GC generation. This will guarantee that the GC itself will not change the GC headers of said instance. Thus, if you immortalize your heap before forking (using the techniques in: https://github.com/python/cpython/pull/31489) then you'll end up removing the vast majority of scenarios where CoW takes place. I can look into writing a new technical article for Instagram with more up to date info but this might take time to get through! Now, I said that we've largely fixed the CoW issue because there are still places where it happens such as: free lists, the small object allocator, etc. But these are relatively small compared to the ones coming from reference counts and the GC head mutations.
![](https://secure.gravatar.com/avatar/351a10f392414345ed67a05e986dc4dd.jpg?s=120&d=mm&r=g)
On Wed, Feb 23, 2022 at 1:46 AM Eddie Elizondo via Python-Dev <python-dev@python.org> wrote:
Same technique don't guarantee same benefit. Like gc.freeze() is needed before immortalize to avoid CoW, some other tricks may be needed too. New article is welcome, but I want sample application we can run, profile, and measure the benefits. Regards, -- Inada Naoki <songofacandy@gmail.com>
![](https://secure.gravatar.com/avatar/870d613430249e453343efc9667ef636.jpg?s=120&d=mm&r=g)
On 16. 02. 22 1:10, Eric Snow wrote:
Thank you very much for writing this down! It's very helpful to see a concrete proposal, and the current state of this idea. I like the change, but I think it's unfortunately more complicated than the PEP suggests.
I think that is a naïve statement. Refcounting is implementation-specific, but it's hardly an *internal* detail. There is code that targets CPython specifically, and relies on the details. The refcount has public getters and setters, and you need a pretty good grasp of the concept to write a C extension. I think that it's safe to assume that this will break people's code, and this PEP should convince us that the breakage is worth it rather than dismiss the issue.
It would be good to note that “container” refers to the GC term, as in https://devguide.python.org/garbage_collector/#identifying-reference-cycles and not e.g. https://docs.python.org/3/library/collections.abc.html#collections.abc.Conta...
Explicitly saying “CPU cache” would make the PEP easier to skim.
This looks out of context. Python has a per-process GIL. It should it go after the next section.
Who was it? Assuming it's not a secret :)
Weren't you planning a PEP on subinterpreter GIL as well? Do you want to submit them together? IMO, as it is, the PEP's motivation doesn't really stand on its own. It's only worth it as a step towards per-interpreter GIL. (We might have a catch-24 situation in the way we currently handle PEPs. That would mean we need to change the process, maybe even permanently. IMO, this PEP will be very helpful in untangling the situation.)
So, any extension that uses the stable ABI will break an invariant. What'll be the impact? The total refcount will probably go out of sync, anything else? If an extension DECREFs an immortal object, will it still match _Py_IMMORTAL_REFCNT? How is that guaranteed? What about extensions compiled with Python 3.11 (with this PEP) that use an older version of the stable ABI, and thus should be compatible with 3.2+? Will they use the old versions of the macros? How will that be tested?
IMO it's specific to the C API, which is wider than just CPython. I don't think we can just assume it'll have no impact on other implementations.
So, do immortal lists immortalize values append()ed to them? (Can you even have an immortal list? Are there limits on what can be immortal?)
* an immortal object's type is effectively immortal
Should this be enforced?
This is a public change. Py_INCREF increments the reference count. Py_REFCNT gets the reference count. For immortal objects, Py_INCREF will no longer function as documented in 3.10, and Py_REFCNT can be used to witness it. Both are public API.
How will the candidates be chosen?
Is it just not helpful, or is it disallowed? What about __subclasses__/tp_subclasses?
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
Thanks for the feedback. My responses are inline below. -eric On Wed, Feb 16, 2022 at 6:36 AM Petr Viktorin <encukou@gmail.com> wrote:
That's good to hear. :)
but I think it's unfortunately more complicated than the PEP suggests.
That would be unsurprising. :)
Sorry for any confusion. I didn't mean to say that refcounting is an internal detail. Rather, I was talking about how the proposed change in refcounting behavior doesn't affect any guaranteed/documented behavior, hence "internal". Perhaps I missed some documented behavior? I was going off the following: * https://docs.python.org/3.11/c-api/intro.html#objects-types-and-reference-co... * https://docs.python.org/3.11/c-api/structures.html#c.Py_REFCNT
There is code that targets CPython specifically, and relies on the details.
Could you elaborate? Do you mean such code relies on specific refcount values?
The refcount has public getters and setters,
Agreed. However, what behavior do users expect and what guarantees do we make? Do we indicate how to interpret the refcount value they receive? What are the use cases under which a user would set an object's refcount to a specific value? Are users setting the refcount of objects they did not create?
and you need a pretty good grasp of the concept to write a C extension.
I would not expect this to be affected by this PEP, except in cases where users are checking/modifying refcounts for objects they did not create (since none of their objects will be immortal).
I think that it's safe to assume that this will break people's code,
Do you have some use case in mind, or an example? From my perspective I'm having a hard time seeing what this proposed change would break. That said, Kevin Modzelewski indicated [1] that there were affected cases for Pyston (though their change in behavior is slightly different). [1] https://mail.python.org/archives/list/python-dev@python.org/message/TPLEYDCX...
Sorry, I didn't mean to be dismissive. I agree that if there is breakage this PEP must address it.
+1
+1
This isn't about a data race. I'm talking about how if an object is active in two different threads (on distinct cores) then incref/decref in one thread will invalidate the cache (line) in the other thread. The only impact of the GIL in this case is that the two threads aren't running simultaneously and the cache invalidation on the idle thread has less impact. Perhaps I've missed something?
Me and Eddit. :) I don't mind saying so.
I'd have to think about that. The other PEP I'm writing for per-interpreter GIL doesn't require immortal objects. They just simplify a number of things. That's my motivation for writing this PEP, in fact. :)
IMO, as it is, the PEP's motivation doesn't really stand on its own. It's only worth it as a step towards per-interpreter GIL.
I expect Eddie would argue otherwise, but I probably wouldn't have written this PEP if it weren't for its benefit to per-interpreter GIL.
Glad to help. :)
The impact would be: objects incref/decref'ed by such a module would be exposed to some of the performance penalties described earlier in the PEP. I expect the potential aggregate cost would be relatively small.
If an extension DECREFs an immortal object, will it still match _Py_IMMORTAL_REFCNT? How is that guaranteed?
It wouldn't match _Py_IMMORTAL_REFCNT, but the high bit of _Py_IMMORTAL_REFCNT would still match. That bit is what we would actually be checking, rather than the full value.
It wouldn't matter unless an object's refcount reached _Py_IMMORTAL_REFCNT, at which point incref/decref would start noop'ing. What is the likelihood (in real code) that an object's refcount would grow that far? Even then, would such an object ever be expected to go back to 0 (and be dealloc'ed)? Otherwise the point is moot.
Fair enough.
We have no plans to do more than ever explicitly immortalize objects. So an immortal list is fine but it would have no effect on the immortality of items it contains, other than implicitly (since the list holds a reference to each item). In general, it would be best to only immortalize immutable objects. If we want to share any objects shared between threads without protection (e.g. per-interpreter GIL) then such objects must be immortal and immutable. So lists and dicts, etc. couldn't be shared (assuming we can't prevent further mutation). However, for objects that will never be shared, it can be practical to make some of them immortal too. For example, sys.modules is a per-interpreter dict that we do not expect to ever get freed until the corresponding interpreter is finalized. By making it immortal, we no longer incur the extra overhead during incref/decref. We can apply this idea in the pursuit of getting back some of that 4% performance we lost. At the end of runtime init we can mark *all* objects as immortal and avoid the extra cost in incref/decref. We only need to worry about immutability with objects that we plan on sharing between threads without a GIL. (FYI, we still need to look closely at the impact of this approach on GC.)
* an immortal object's type is effectively immortal
Should this be enforced?
There is nothing to enforce. The object holds a reference to its type so the type will never be cleaned up as long as the immortal object isn't. Hence the type of an immortal object is effectively immortal. We don't need the type to actually be marked as immortal.
Basically, you;d have to do it deliberately (e.g. incref the object in a tight loop). Even with a tight loop it would take a long time to count up to 2^60 or whatever the chosen value is.
I agree that the change to the implementation of some public API is certainly public, as is the change in behavior for immortal objects, as is the potential <4% performance regression. By "public feature" I was referring to immortal objects. We are not exposing that to users, other than that they might notice some objects now have a really high refcount that does not change.
You are right that "Increment the reference count for object o." (as documented) will not be true for an immortal object. Instead it would be something like "indicate that there is an additional reference for object o". I'll be sure to update the PEP, to add that change to the docs wording. Regardless, how important is that distinction? If it matters then clearly this proposal needs to change. As an exercise, we can consider one of the most used objects, None, and that we would make it immortal. How would that impact users of Py_INCREF() and Py_REFCNT()?
Any objects that we would expect to share globally (ergo otherwise immutable) will be made immortal. That means the static types, the builtin singletons, the objects in _PyRuntimeState.global_objects, etc.
It is not disallowed. Also, I need to clarify that section since there are cases where making a mutable object immortal can provide performance benefits, as described earlier.
What about __subclasses__/tp_subclasses?
That's one we'll have to deal with specially, e.g. for core static types we'd store the object on PyInterpreterState. Then the __subclasses__ getter would do a lookup on the current interpreter, instead of using tp_subclasses. We could get rid of tp_subclasses or perhaps use it only for the main interpreter.
![](https://secure.gravatar.com/avatar/870d613430249e453343efc9667ef636.jpg?s=120&d=mm&r=g)
On 17. 02. 22 2:13, Eric Snow wrote:
That's what I hoped the PEP would tell me. Instead of simply claiming that there won't be issues, it should explain why we won't have any issues.
IMO, the reasoning should start from the assumption that things will break, and explain why they won't (or why the breakage is acceptable). If the PEP simply tells me upfront that things will be OK, I have a hard time trusting it. IOW, it's clear you've thought about this a lot (especially after reading your replies here), but it's not clear from the PEP. That might be editorial nitpicking, if it wasn't for the fact that I want find any gaps in your research and reasoning, and invite everyone else to look for them as well. [...]
Ah, I see. I was confused by this:
This is also true even with the GIL, though the impact is smaller.
Smaller than what? The baseline for that comparison is a hypothetical GIL-less interpreter, which is only introduced in the next section. Perhaps say something like "Python's GIL helps avoid this effect, but doesn't eliminate it."
Please think about it. If you removed the benefits for per-interpreter GIL, the motivation section would be reduced to is memory savings for fork/CoW. (And lots of performance improvements that are great in theory but sum up to a 4% loss.)
It makes sense once you know _Py_IMMORTAL_REFCNT has two bits set. Maybe it'd be good to note that detail -- it's an internal detail, but crucial for making things safe.
That's exactly the questions I'd hope the PEP to answer. I could estimate that likelihood myself, but I'd really rather just check your work ;) (Hm, maybe I couldn't even estimate this myself. The PEP doesn't say what the value of _Py_IMMORTAL_REFCNT is, and in the ref implementation a comment says "This can be safely changed to a smaller value".) I'll omit the rest of the mail, it's clarifications (thank you!) & variations on the themes above.
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
Again, thanks for the reply. It's helpful. My further responses are inline below. -eric On Thu, Feb 17, 2022 at 3:42 AM Petr Viktorin <encukou@gmail.com> wrote:
Good point.. It's easy to dump a bunch of unnecessary info into a PEP, and it was hard for me to know where the line was in this case. There hadn't been much discussion previously about the possible ways this change might break users. So thanks for bringing this up. I'll be sure to put a more detailed explanation in the PEP, with a bit more evidence too.
Ah, I see. I was confused by this:
No worries! I'm glad we cleared it up. I'll make sure the PEP is more understandable about this.
Good point. I'll clarify the point.
Sounds good. Would this involve more than a note at the top of the PEP? And just to be clear, I don't think the fate of a per-interpreter GIL PEP should not depend on this one.
Will do.
Got it. I'll be sure that the PEP is more clear about that. Thanks for letting me know.
![](https://secure.gravatar.com/avatar/53c166c5e1f0eef9ff4eb4d0b6ec9371.jpg?s=120&d=mm&r=g)
I experimented with this at the EuroPython sprints in Berlin years ago. I was sitting next to MvL, who had an interesting observation about it. He suggested(*) all the constants unmarshalled as part of loading a module should be "immortal", and if we could rejigger how we allocated them to store them in their own memory pages, that would dovetail nicely with COW semantics, cutting down on the memory use of preforked server processes. //arry/ (*) Assuming I remember what he said accurately, of course. If any of this is dumb assume it's my fault.
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 11:06 AM Larry Hastings <larry@hastings.org> wrote:
I experimented with this at the EuroPython sprints in Berlin years ago. I was sitting next to MvL, who had an interesting observation about it.
Classic MvL! :)
He suggested(*) all the constants unmarshalled as part of loading a module should be "immortal", and if we could rejigger how we allocated them to store them in their own memory pages, that would dovetail nicely with COW semantics, cutting down on the memory use of preforked server processes.
Cool idea. I may mention it in the PEP as a possibility. Thanks! -eric
![](https://secure.gravatar.com/avatar/53c166c5e1f0eef9ff4eb4d0b6ec9371.jpg?s=120&d=mm&r=g)
On 2/19/22 04:41, Antoine Pitrou wrote:
Do applications do that for some reason? Python module reloading is already so marginal, I thought hardly anybody did it. Anyway, my admittedly-dim understanding is that COW is most helpful for the "pre-fork" server model, and I bet those folks never bother to unload modules. //arry/
![](https://secure.gravatar.com/avatar/1fee087d7a1ca17c8ad348271819a8d5.jpg?s=120&d=mm&r=g)
On Sat, 19 Feb 2022 12:05:22 -0500 Larry Hastings <larry@hastings.org> wrote:
I have no data point, but I would be surprised if there wasn't at least one example of such usage somewhere in the world, for example to hotload fixes in specific parts of an application without restarting it (or as part of a plugin / extension / mod system). There's also the auto-reload functionality in some Web servers or frameworks, but that is admittedly more of a development feature. Regards Antoine.
![](https://secure.gravatar.com/avatar/53c166c5e1f0eef9ff4eb4d0b6ec9371.jpg?s=120&d=mm&r=g)
While I don't think it's fine to play devil's advocate, given the choice between "this will help a common production use-case" (pre-fork servers) and "this could hurt a hypothetical production use case" (long-running applications that reload modules enough times this could waste a significant amount of memory), I think the former is more important. //arry/ On 2/20/22 06:01, Antoine Pitrou wrote:
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Mon, 21 Feb 2022 at 16:47, Larry Hastings <larry@hastings.org> wrote:
While I don't think it's fine to play devil's advocate, given the choice between "this will help a common production use-case" (pre-fork servers) and "this could hurt a hypothetical production use case" (long-running applications that reload modules enough times this could waste a significant amount of memory), I think the former is more important.
Can the cost be mitigated by reusing immortal objects? So, for instance, a module-level constant of 60*60*24*365 might be made immortal, meaning it doesn't get disposed of with the module, but if the module gets reloaded, no *additional* object would be created. I'm assuming here that any/all objects unmarshalled with the module can indeed be shared in this way. If that isn't always true, then that would reduce the savings here. ChrisA
![](https://secure.gravatar.com/avatar/53c166c5e1f0eef9ff4eb4d0b6ec9371.jpg?s=120&d=mm&r=g)
On 2/21/22 22:06, Chris Angelico wrote:
It could, but we don't have any general-purpose mechanism for that. We have "interned strings" and "small ints", but we don't have e.g. "interned tuples" or "frequently-used large ints and floats". That said, in this hypothetical scenario wherein someone is constantly reloading modules but we also have immortal objects, maybe someone could write a smart reloader that lets them somehow propagate existing immortal objects to the new module. It wouldn't even have to be that sophisticated, just some sort of hook into the marshal step combined with a per-module persistent cache of unmarshalled constants. //arry/
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Tue, 22 Feb 2022 at 03:00, Larry Hastings <larry@hastings.org> wrote:
Fair enough. Since only immortal objects would affect this, it may be possible for the smart reloader to simply be told of all new immortals, and it can then intern things itself. IMO that strengthens the argument that prefork servers are a more significant use-case than reloading, without necessarily compromising the rarer case. Thanks for the explanation. ChrisA
![](https://secure.gravatar.com/avatar/d97f6ae152f5eadd183b0dc80ce02b79.jpg?s=120&d=mm&r=g)
fwiw Pyston has immortal objects, though with a slightly different goal and thus design [1]. I'm not necessarily advocating for our design (it makes most sense if there is a JIT involved), but just writing to report our experience of making a change like this and the compatibility effects. Importantly, our system allows for the reference count of immortal objects to change, as long as it doesn't go below half of the original very-high value. So extension code with no concept of immortality will still update the reference counts of immortal objects, but this is fine. Because of this we haven't seen any issues with extension modules. The small amount of compatibility challenges we've run into have been in testing code that checks for memory leaks. For example this code breaks on Pyston: def test(): starting_refcount = sys.getrefcount(1) doABunchOfStuff() assert sys.getrefcount(1) == starting_refcount This might work with this PEP, but we've also seen code that asserts that the refcount increases by a specific value, which I believe wouldn't. For Pyston we've simply disabled these tests, figuring that our users still have CPython to test on. Personally I consider this breakage to be small, but I hadn't seen anyone mention the potential usage of sys.getrefcount() so I thought I'd bring it up. - kmod [1] Our goal is to entirely remove refcounting operations when we can prove we are operating on an immortal object. We can prove it in a couple cases: sometimes simply, such as in Py_RETURN_NONE, but mostly our JIT will often know the immortality of objects it embeds into the code. So if we can prove statically that an object is immortal then we elide the incref/decrefs, and if we can't then we use an unmodified Py_INCREF/Py_DECREF. This means that our reference counts on immortal objects will change, so we detect immortality by checking if the reference count is at least half of the original very-high value. On Tue, Feb 15, 2022 at 7:13 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
Thanks! On Wed, Feb 16, 2022 at 11:19 AM Kevin Modzelewski <kevmod@gmail.com> wrote:
In CPython we will *have* to allow this in order to support binary packages built with earlier CPython versions (assuming they only use the stable ABI). Those packages will necessarily use INCREF/DECREF macros that don't check for the immortality bit. Yes, it will break COW, but nevertheless we have to support the Stable ABI, and INCREF/DECREF are in the Stable ABI. If you want COW you will have to compile such packages from source. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 12:14 PM Kevin Modzelewski <kevmod@gmail.com> wrote:
fwiw Pyston has immortal objects, though with a slightly different goal and thus design [1]. I'm not necessarily advocating for our design (it makes most sense if there is a JIT involved), but just writing to report our experience of making a change like this and the compatibility effects.
Thanks!
Importantly, our system allows for the reference count of immortal objects to change, as long as it doesn't go below half of the original very-high value. So extension code with no concept of immortality will still update the reference counts of immortal objects, but this is fine. Because of this we haven't seen any issues with extension modules.
As Guido noted, we are taking a similar approach for the sake of older extensions built with the limited API. As a precaution, we start the refcount for immortal objects basically at _Py_IMMORTAL_REFCNT * 1.5. Then we only need to check the high bit of _Py_IMMORTAL_REFCNT to see if an object is immortal.
Right, this is less of an issue for us since normally we do not change the refcount of immortal objects. Also, CPython's test suite keeps us honest about leaking references and memory blocks. :)
For Pyston we've simply disabled these tests, figuring that our users still have CPython to test on. Personally I consider this breakage to be small, but I hadn't seen anyone mention the potential usage of sys.getrefcount() so I thought I'd bring it up.
Thanks again for that.
[1] Our goal is to entirely remove refcounting operations when we can prove we are operating on an immortal object. We can prove it in a couple cases: sometimes simply, such as in Py_RETURN_NONE, but mostly our JIT will often know the immortality of objects it embeds into the code. So if we can prove statically that an object is immortal then we elide the incref/decrefs, and if we can't then we use an unmodified Py_INCREF/Py_DECREF. This means that our reference counts on immortal objects will change, so we detect immortality by checking if the reference count is at least half of the original very-high value.
FWIW, we anticipate that we can take a similar approach in CPython's eval loop, specializing for immortal objects. We are also updating Py_RETURN_NONE, etc. to stop incref'ing. -eric
![](https://secure.gravatar.com/avatar/d6b9415353e04ffa6de5a8f3aaea0553.jpg?s=120&d=mm&r=g)
On 2/15/2022 7:10 PM, Eric Snow wrote:
* the naive implementation shows a 4% slowdown
Without understanding all the benefits, this seems a bit too much for me. 2% would be much better.
* we have a number of strategies that should reduce that penalty
I would like to see that before approving the PEP.
* without immortal objects, the implementation for per-interpreter GIL will require a number of non-trivial workarounds
To me, that says to speed up immortality first.
That last one is particularly meaningful to me since it means we would definitely miss the 3.11 feature freeze.
3 1/2 months from now.
With immortal objects, 3.11 would still be in reach.
Is it worth trying to rush it a bit? -- Terry Jan Reedy
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 2:41 PM Terry Reedy <tjreedy@udel.edu> wrote:
Yeah, we consider 4% to be too much. 2% would be great. Performance-neutral would be even better, of course. :)
* we have a number of strategies that should reduce that penalty
I would like to see that before approving the PEP.
I expect it would be enough to show where things stand with benchmark results. It did not seem like the actual mitigation strategies were as important, so I opted to leave them out to avoid clutter. Plus it isn't clear yet what approaches will help the most, nor how much we can win back. So I didn't want to distract with hypotheticals. If it's important I can add that in.
Agreed.
I'd rather not rush this. I'm saying that, for per-interpreter GIL, 3.11 is within reach without rushing if we have immortal objects. Without them, 3.11 is realistic without rushing things. -eric
![](https://secure.gravatar.com/avatar/d91ce240d2445584e295b5406d12df70.jpg?s=120&d=mm&r=g)
I suggest being a little more explicit (even blatant) that the particular details of: (1) which subset of functionally immortal objects are marked as immortal (2) how to mark something as immortal (3) how to recognize something as immortal (4) which memory-management activities are skipped or modified for immortal objects are not only Cpython-specific, but are also private implementation details that are expected to change in subsequent versions. Ideally, things like the interned string dictionary or the constants from a pyc file will be not merely immortal, but stored in an immortal-only memory page, so that they won't be flushed or CoW-ed when a nearby non-immortal object is modified. Getting those details right will make a difference to performance, and you don't want to be locked in to the first draft. -jJ
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 10:43 PM Jim J. Jewett <jimjjewett@gmail.com> wrote:
Excellent point.
Ideally, things like the interned string dictionary or the constants from a pyc file will be not merely immortal, but stored in an immortal-only memory page, so that they won't be flushed or CoW-ed when a nearby non-immortal object is modified.
That's definitely worth looking into.
Getting those details right will make a difference to performance, and you don't want to be locked in to the first draft.
Yep, that is one big reason I was trying to avoid spelling out every detail of our plan. :) -eric
![](https://secure.gravatar.com/avatar/351a10f392414345ed67a05e986dc4dd.jpg?s=120&d=mm&r=g)
+1 for overall idea. Some comments:
Also note that "fork" isn't the only operating system mechanism that uses copy-on-write semantics.
Could you elaborate? mmap, maybe? Generally speaking, fork is very difficult to use in safe. My company's web apps load applications and libraries *after* fork, not *before* fork for safety. We had changed multiprocessing to use spawn by default on macOS. So I don't recommend many Python users to use fork. So if you know how to get benefit from CoW without fork, I want to know it.
How about interned strings? Should the intern dict be belonging to runtime, or (sub)interpreter? If the interned dict is belonging to runtime, all interned dict should be immortal to be shared between subinterpreters. If the interned dict is belonging to interpreter, should we register immortalized string to all interpreters? Regards, -- Inada Naoki <songofacandy@gmail.com>
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 12:37 AM Inada Naoki <songofacandy@gmail.com> wrote:
+1 for overall idea.
Great!
Sorry if I got your hopes up. Yeah, I was talking about mmap.
There will likely be others we have not enumerated here.
How about interned strings?
Marking every interned string as immortal may make sense.
Excellent questions. Making immutable objects immortal is relatively simple. For the most part, mutable objects should not be shared between interpreters without protection (e.g. the GIL). The interned dict isn't exposed to Python code or the C-API, so there's less risk, but it still wouldn't work without cleverness. So it should be per-interpreter. It would be nice if it were global though. :)
If the interned dict is belonging to interpreter, should we register immortalized string to all interpreters?
That's a good point. It may be worth doing something like that. -eric
![](https://secure.gravatar.com/avatar/351a10f392414345ed67a05e986dc4dd.jpg?s=120&d=mm&r=g)
On Thu, Feb 17, 2022 at 7:01 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
Is there any common tool that utilize CoW by mmap? If you know, please its link to the PEP. If there is no common tool, most Python users can get benefit from this. Generally speaking, fork is a legacy API. It is too difficult to know which library is fork-safe, even for stdlibs. And Windows users can not use fork. Optimizing for non-fork use case is much better than optimizing for fork use cases. * https://gist.github.com/nicowilliams/a8a07b0fc75df05f684c23c18d7db234 * https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.p... * https://www.evanjones.ca/fork-is-dangerous.html * https://bugs.python.org/issue33725 I hope per-interpreter GIL replaces fork use cases. But tools using CoW without fork also welcome, especially if it supports Windows. Anyway, I don't believe stopping refcounting will fix the CoW issue yet. See this article [1] again. [1] https://instagram-engineering.com/dismissing-python-garbage-collection-at-in... Note that they failed to fix CoW by stopping refcounting code objects! (*) Most CoW was caused by cyclic GC and finalization caused most CoW. (*) It is not surprising to me because eval loop don't incre/decref most code attributes. They borrow reference from the code object. So we need a sample application and profile it, before saying it fixes CoW. Could you provide some data, or drop the CoW issue from this PEP until it is proved? Regards, -- Inada Naoki <songofacandy@gmail.com>
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 8:45 PM Inada Naoki <songofacandy@gmail.com> wrote:
Sorry, I'm not aware of any, but I also haven't researched the topic much. Regardless, that would be a good line of inquiry. A reference like that would probably help make the PEP a bit more justifiable without per-interpreter GIL. :)
+1
I hope per-interpreter GIL replaces fork use cases.
Yeah, that's definitely one big benefit.
But tools using CoW without fork also welcome, especially if it supports Windows.
+1
That's definitely an important point, given that the main objective of the proposal is to allow disabling mutation of runtime-internal object state so that some objects can be made truly immutable. I'm sure Eddie has some good insight on the matter (and may have even been involved in writing that article). Eddie?
Note that they failed to fix CoW by stopping refcounting code objects! (*) Most CoW was caused by cyclic GC and finalization caused most CoW.
That's a good observation!
(*) It is not surprising to me because eval loop don't incre/decref most code attributes. They borrow reference from the code object.
+1
We'll look into that. -eric
![](https://secure.gravatar.com/avatar/5f0c59c0ed4548113ea5552830986c69.jpg?s=120&d=mm&r=g)
Hey Inada, thanks for the feedback
Generally speaking, fork is a legacy API. It is too difficult to know which library is fork-safe, even for stdlibs.
Yes, this is something that Instagram has to go into great lengths to make sure that we get the entire execution into a state where it's safe to fork. It works, but it's hard to maintain. We'd rather have a simpler model!
I hope per-interpreter GIL replaces fork use cases.
We hope so too, hence the big push towards having immutable shared state across the interpreters. For large applications like Instagram, this is a must, otherwise copying state into every interpreter would be too costly.
Anyway, I don't believe stopping refcounting will fix the CoW issue yet. See this article [1] again.
That article is five years old so it doesn't reflect the current state of the system! We have continuous profiling and monitoring of Copy on Writes and after introducing the techniques described in this PEP, we have largely fixed the majority of scenarios where this happens. You are right in the fact that just addressing reference counting will not fix all CoW issues. The trick here is also to leverage the permanent GC generation used for the `gc.freeze` API. That is, if you have a container that it's known to be immortal, it should be pushed into the permanent GC generation. This will guarantee that the GC itself will not change the GC headers of said instance. Thus, if you immortalize your heap before forking (using the techniques in: https://github.com/python/cpython/pull/31489) then you'll end up removing the vast majority of scenarios where CoW takes place. I can look into writing a new technical article for Instagram with more up to date info but this might take time to get through! Now, I said that we've largely fixed the CoW issue because there are still places where it happens such as: free lists, the small object allocator, etc. But these are relatively small compared to the ones coming from reference counts and the GC head mutations.
![](https://secure.gravatar.com/avatar/351a10f392414345ed67a05e986dc4dd.jpg?s=120&d=mm&r=g)
On Wed, Feb 23, 2022 at 1:46 AM Eddie Elizondo via Python-Dev <python-dev@python.org> wrote:
Same technique don't guarantee same benefit. Like gc.freeze() is needed before immortalize to avoid CoW, some other tricks may be needed too. New article is welcome, but I want sample application we can run, profile, and measure the benefits. Regards, -- Inada Naoki <songofacandy@gmail.com>
![](https://secure.gravatar.com/avatar/870d613430249e453343efc9667ef636.jpg?s=120&d=mm&r=g)
On 16. 02. 22 1:10, Eric Snow wrote:
Thank you very much for writing this down! It's very helpful to see a concrete proposal, and the current state of this idea. I like the change, but I think it's unfortunately more complicated than the PEP suggests.
I think that is a naïve statement. Refcounting is implementation-specific, but it's hardly an *internal* detail. There is code that targets CPython specifically, and relies on the details. The refcount has public getters and setters, and you need a pretty good grasp of the concept to write a C extension. I think that it's safe to assume that this will break people's code, and this PEP should convince us that the breakage is worth it rather than dismiss the issue.
It would be good to note that “container” refers to the GC term, as in https://devguide.python.org/garbage_collector/#identifying-reference-cycles and not e.g. https://docs.python.org/3/library/collections.abc.html#collections.abc.Conta...
Explicitly saying “CPU cache” would make the PEP easier to skim.
This looks out of context. Python has a per-process GIL. It should it go after the next section.
Who was it? Assuming it's not a secret :)
Weren't you planning a PEP on subinterpreter GIL as well? Do you want to submit them together? IMO, as it is, the PEP's motivation doesn't really stand on its own. It's only worth it as a step towards per-interpreter GIL. (We might have a catch-24 situation in the way we currently handle PEPs. That would mean we need to change the process, maybe even permanently. IMO, this PEP will be very helpful in untangling the situation.)
So, any extension that uses the stable ABI will break an invariant. What'll be the impact? The total refcount will probably go out of sync, anything else? If an extension DECREFs an immortal object, will it still match _Py_IMMORTAL_REFCNT? How is that guaranteed? What about extensions compiled with Python 3.11 (with this PEP) that use an older version of the stable ABI, and thus should be compatible with 3.2+? Will they use the old versions of the macros? How will that be tested?
IMO it's specific to the C API, which is wider than just CPython. I don't think we can just assume it'll have no impact on other implementations.
So, do immortal lists immortalize values append()ed to them? (Can you even have an immortal list? Are there limits on what can be immortal?)
* an immortal object's type is effectively immortal
Should this be enforced?
This is a public change. Py_INCREF increments the reference count. Py_REFCNT gets the reference count. For immortal objects, Py_INCREF will no longer function as documented in 3.10, and Py_REFCNT can be used to witness it. Both are public API.
How will the candidates be chosen?
Is it just not helpful, or is it disallowed? What about __subclasses__/tp_subclasses?
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
Thanks for the feedback. My responses are inline below. -eric On Wed, Feb 16, 2022 at 6:36 AM Petr Viktorin <encukou@gmail.com> wrote:
That's good to hear. :)
but I think it's unfortunately more complicated than the PEP suggests.
That would be unsurprising. :)
Sorry for any confusion. I didn't mean to say that refcounting is an internal detail. Rather, I was talking about how the proposed change in refcounting behavior doesn't affect any guaranteed/documented behavior, hence "internal". Perhaps I missed some documented behavior? I was going off the following: * https://docs.python.org/3.11/c-api/intro.html#objects-types-and-reference-co... * https://docs.python.org/3.11/c-api/structures.html#c.Py_REFCNT
There is code that targets CPython specifically, and relies on the details.
Could you elaborate? Do you mean such code relies on specific refcount values?
The refcount has public getters and setters,
Agreed. However, what behavior do users expect and what guarantees do we make? Do we indicate how to interpret the refcount value they receive? What are the use cases under which a user would set an object's refcount to a specific value? Are users setting the refcount of objects they did not create?
and you need a pretty good grasp of the concept to write a C extension.
I would not expect this to be affected by this PEP, except in cases where users are checking/modifying refcounts for objects they did not create (since none of their objects will be immortal).
I think that it's safe to assume that this will break people's code,
Do you have some use case in mind, or an example? From my perspective I'm having a hard time seeing what this proposed change would break. That said, Kevin Modzelewski indicated [1] that there were affected cases for Pyston (though their change in behavior is slightly different). [1] https://mail.python.org/archives/list/python-dev@python.org/message/TPLEYDCX...
Sorry, I didn't mean to be dismissive. I agree that if there is breakage this PEP must address it.
+1
+1
This isn't about a data race. I'm talking about how if an object is active in two different threads (on distinct cores) then incref/decref in one thread will invalidate the cache (line) in the other thread. The only impact of the GIL in this case is that the two threads aren't running simultaneously and the cache invalidation on the idle thread has less impact. Perhaps I've missed something?
Me and Eddit. :) I don't mind saying so.
I'd have to think about that. The other PEP I'm writing for per-interpreter GIL doesn't require immortal objects. They just simplify a number of things. That's my motivation for writing this PEP, in fact. :)
IMO, as it is, the PEP's motivation doesn't really stand on its own. It's only worth it as a step towards per-interpreter GIL.
I expect Eddie would argue otherwise, but I probably wouldn't have written this PEP if it weren't for its benefit to per-interpreter GIL.
Glad to help. :)
The impact would be: objects incref/decref'ed by such a module would be exposed to some of the performance penalties described earlier in the PEP. I expect the potential aggregate cost would be relatively small.
If an extension DECREFs an immortal object, will it still match _Py_IMMORTAL_REFCNT? How is that guaranteed?
It wouldn't match _Py_IMMORTAL_REFCNT, but the high bit of _Py_IMMORTAL_REFCNT would still match. That bit is what we would actually be checking, rather than the full value.
It wouldn't matter unless an object's refcount reached _Py_IMMORTAL_REFCNT, at which point incref/decref would start noop'ing. What is the likelihood (in real code) that an object's refcount would grow that far? Even then, would such an object ever be expected to go back to 0 (and be dealloc'ed)? Otherwise the point is moot.
Fair enough.
We have no plans to do more than ever explicitly immortalize objects. So an immortal list is fine but it would have no effect on the immortality of items it contains, other than implicitly (since the list holds a reference to each item). In general, it would be best to only immortalize immutable objects. If we want to share any objects shared between threads without protection (e.g. per-interpreter GIL) then such objects must be immortal and immutable. So lists and dicts, etc. couldn't be shared (assuming we can't prevent further mutation). However, for objects that will never be shared, it can be practical to make some of them immortal too. For example, sys.modules is a per-interpreter dict that we do not expect to ever get freed until the corresponding interpreter is finalized. By making it immortal, we no longer incur the extra overhead during incref/decref. We can apply this idea in the pursuit of getting back some of that 4% performance we lost. At the end of runtime init we can mark *all* objects as immortal and avoid the extra cost in incref/decref. We only need to worry about immutability with objects that we plan on sharing between threads without a GIL. (FYI, we still need to look closely at the impact of this approach on GC.)
* an immortal object's type is effectively immortal
Should this be enforced?
There is nothing to enforce. The object holds a reference to its type so the type will never be cleaned up as long as the immortal object isn't. Hence the type of an immortal object is effectively immortal. We don't need the type to actually be marked as immortal.
Basically, you;d have to do it deliberately (e.g. incref the object in a tight loop). Even with a tight loop it would take a long time to count up to 2^60 or whatever the chosen value is.
I agree that the change to the implementation of some public API is certainly public, as is the change in behavior for immortal objects, as is the potential <4% performance regression. By "public feature" I was referring to immortal objects. We are not exposing that to users, other than that they might notice some objects now have a really high refcount that does not change.
You are right that "Increment the reference count for object o." (as documented) will not be true for an immortal object. Instead it would be something like "indicate that there is an additional reference for object o". I'll be sure to update the PEP, to add that change to the docs wording. Regardless, how important is that distinction? If it matters then clearly this proposal needs to change. As an exercise, we can consider one of the most used objects, None, and that we would make it immortal. How would that impact users of Py_INCREF() and Py_REFCNT()?
Any objects that we would expect to share globally (ergo otherwise immutable) will be made immortal. That means the static types, the builtin singletons, the objects in _PyRuntimeState.global_objects, etc.
It is not disallowed. Also, I need to clarify that section since there are cases where making a mutable object immortal can provide performance benefits, as described earlier.
What about __subclasses__/tp_subclasses?
That's one we'll have to deal with specially, e.g. for core static types we'd store the object on PyInterpreterState. Then the __subclasses__ getter would do a lookup on the current interpreter, instead of using tp_subclasses. We could get rid of tp_subclasses or perhaps use it only for the main interpreter.
![](https://secure.gravatar.com/avatar/870d613430249e453343efc9667ef636.jpg?s=120&d=mm&r=g)
On 17. 02. 22 2:13, Eric Snow wrote:
That's what I hoped the PEP would tell me. Instead of simply claiming that there won't be issues, it should explain why we won't have any issues.
IMO, the reasoning should start from the assumption that things will break, and explain why they won't (or why the breakage is acceptable). If the PEP simply tells me upfront that things will be OK, I have a hard time trusting it. IOW, it's clear you've thought about this a lot (especially after reading your replies here), but it's not clear from the PEP. That might be editorial nitpicking, if it wasn't for the fact that I want find any gaps in your research and reasoning, and invite everyone else to look for them as well. [...]
Ah, I see. I was confused by this:
This is also true even with the GIL, though the impact is smaller.
Smaller than what? The baseline for that comparison is a hypothetical GIL-less interpreter, which is only introduced in the next section. Perhaps say something like "Python's GIL helps avoid this effect, but doesn't eliminate it."
Please think about it. If you removed the benefits for per-interpreter GIL, the motivation section would be reduced to is memory savings for fork/CoW. (And lots of performance improvements that are great in theory but sum up to a 4% loss.)
It makes sense once you know _Py_IMMORTAL_REFCNT has two bits set. Maybe it'd be good to note that detail -- it's an internal detail, but crucial for making things safe.
That's exactly the questions I'd hope the PEP to answer. I could estimate that likelihood myself, but I'd really rather just check your work ;) (Hm, maybe I couldn't even estimate this myself. The PEP doesn't say what the value of _Py_IMMORTAL_REFCNT is, and in the ref implementation a comment says "This can be safely changed to a smaller value".) I'll omit the rest of the mail, it's clarifications (thank you!) & variations on the themes above.
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
Again, thanks for the reply. It's helpful. My further responses are inline below. -eric On Thu, Feb 17, 2022 at 3:42 AM Petr Viktorin <encukou@gmail.com> wrote:
Good point.. It's easy to dump a bunch of unnecessary info into a PEP, and it was hard for me to know where the line was in this case. There hadn't been much discussion previously about the possible ways this change might break users. So thanks for bringing this up. I'll be sure to put a more detailed explanation in the PEP, with a bit more evidence too.
Ah, I see. I was confused by this:
No worries! I'm glad we cleared it up. I'll make sure the PEP is more understandable about this.
Good point. I'll clarify the point.
Sounds good. Would this involve more than a note at the top of the PEP? And just to be clear, I don't think the fate of a per-interpreter GIL PEP should not depend on this one.
Will do.
Got it. I'll be sure that the PEP is more clear about that. Thanks for letting me know.
![](https://secure.gravatar.com/avatar/53c166c5e1f0eef9ff4eb4d0b6ec9371.jpg?s=120&d=mm&r=g)
I experimented with this at the EuroPython sprints in Berlin years ago. I was sitting next to MvL, who had an interesting observation about it. He suggested(*) all the constants unmarshalled as part of loading a module should be "immortal", and if we could rejigger how we allocated them to store them in their own memory pages, that would dovetail nicely with COW semantics, cutting down on the memory use of preforked server processes. //arry/ (*) Assuming I remember what he said accurately, of course. If any of this is dumb assume it's my fault.
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 11:06 AM Larry Hastings <larry@hastings.org> wrote:
I experimented with this at the EuroPython sprints in Berlin years ago. I was sitting next to MvL, who had an interesting observation about it.
Classic MvL! :)
He suggested(*) all the constants unmarshalled as part of loading a module should be "immortal", and if we could rejigger how we allocated them to store them in their own memory pages, that would dovetail nicely with COW semantics, cutting down on the memory use of preforked server processes.
Cool idea. I may mention it in the PEP as a possibility. Thanks! -eric
![](https://secure.gravatar.com/avatar/53c166c5e1f0eef9ff4eb4d0b6ec9371.jpg?s=120&d=mm&r=g)
On 2/19/22 04:41, Antoine Pitrou wrote:
Do applications do that for some reason? Python module reloading is already so marginal, I thought hardly anybody did it. Anyway, my admittedly-dim understanding is that COW is most helpful for the "pre-fork" server model, and I bet those folks never bother to unload modules. //arry/
![](https://secure.gravatar.com/avatar/1fee087d7a1ca17c8ad348271819a8d5.jpg?s=120&d=mm&r=g)
On Sat, 19 Feb 2022 12:05:22 -0500 Larry Hastings <larry@hastings.org> wrote:
I have no data point, but I would be surprised if there wasn't at least one example of such usage somewhere in the world, for example to hotload fixes in specific parts of an application without restarting it (or as part of a plugin / extension / mod system). There's also the auto-reload functionality in some Web servers or frameworks, but that is admittedly more of a development feature. Regards Antoine.
![](https://secure.gravatar.com/avatar/53c166c5e1f0eef9ff4eb4d0b6ec9371.jpg?s=120&d=mm&r=g)
While I don't think it's fine to play devil's advocate, given the choice between "this will help a common production use-case" (pre-fork servers) and "this could hurt a hypothetical production use case" (long-running applications that reload modules enough times this could waste a significant amount of memory), I think the former is more important. //arry/ On 2/20/22 06:01, Antoine Pitrou wrote:
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Mon, 21 Feb 2022 at 16:47, Larry Hastings <larry@hastings.org> wrote:
While I don't think it's fine to play devil's advocate, given the choice between "this will help a common production use-case" (pre-fork servers) and "this could hurt a hypothetical production use case" (long-running applications that reload modules enough times this could waste a significant amount of memory), I think the former is more important.
Can the cost be mitigated by reusing immortal objects? So, for instance, a module-level constant of 60*60*24*365 might be made immortal, meaning it doesn't get disposed of with the module, but if the module gets reloaded, no *additional* object would be created. I'm assuming here that any/all objects unmarshalled with the module can indeed be shared in this way. If that isn't always true, then that would reduce the savings here. ChrisA
![](https://secure.gravatar.com/avatar/53c166c5e1f0eef9ff4eb4d0b6ec9371.jpg?s=120&d=mm&r=g)
On 2/21/22 22:06, Chris Angelico wrote:
It could, but we don't have any general-purpose mechanism for that. We have "interned strings" and "small ints", but we don't have e.g. "interned tuples" or "frequently-used large ints and floats". That said, in this hypothetical scenario wherein someone is constantly reloading modules but we also have immortal objects, maybe someone could write a smart reloader that lets them somehow propagate existing immortal objects to the new module. It wouldn't even have to be that sophisticated, just some sort of hook into the marshal step combined with a per-module persistent cache of unmarshalled constants. //arry/
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Tue, 22 Feb 2022 at 03:00, Larry Hastings <larry@hastings.org> wrote:
Fair enough. Since only immortal objects would affect this, it may be possible for the smart reloader to simply be told of all new immortals, and it can then intern things itself. IMO that strengthens the argument that prefork servers are a more significant use-case than reloading, without necessarily compromising the rarer case. Thanks for the explanation. ChrisA
![](https://secure.gravatar.com/avatar/d97f6ae152f5eadd183b0dc80ce02b79.jpg?s=120&d=mm&r=g)
fwiw Pyston has immortal objects, though with a slightly different goal and thus design [1]. I'm not necessarily advocating for our design (it makes most sense if there is a JIT involved), but just writing to report our experience of making a change like this and the compatibility effects. Importantly, our system allows for the reference count of immortal objects to change, as long as it doesn't go below half of the original very-high value. So extension code with no concept of immortality will still update the reference counts of immortal objects, but this is fine. Because of this we haven't seen any issues with extension modules. The small amount of compatibility challenges we've run into have been in testing code that checks for memory leaks. For example this code breaks on Pyston: def test(): starting_refcount = sys.getrefcount(1) doABunchOfStuff() assert sys.getrefcount(1) == starting_refcount This might work with this PEP, but we've also seen code that asserts that the refcount increases by a specific value, which I believe wouldn't. For Pyston we've simply disabled these tests, figuring that our users still have CPython to test on. Personally I consider this breakage to be small, but I hadn't seen anyone mention the potential usage of sys.getrefcount() so I thought I'd bring it up. - kmod [1] Our goal is to entirely remove refcounting operations when we can prove we are operating on an immortal object. We can prove it in a couple cases: sometimes simply, such as in Py_RETURN_NONE, but mostly our JIT will often know the immortality of objects it embeds into the code. So if we can prove statically that an object is immortal then we elide the incref/decrefs, and if we can't then we use an unmodified Py_INCREF/Py_DECREF. This means that our reference counts on immortal objects will change, so we detect immortality by checking if the reference count is at least half of the original very-high value. On Tue, Feb 15, 2022 at 7:13 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
Thanks! On Wed, Feb 16, 2022 at 11:19 AM Kevin Modzelewski <kevmod@gmail.com> wrote:
In CPython we will *have* to allow this in order to support binary packages built with earlier CPython versions (assuming they only use the stable ABI). Those packages will necessarily use INCREF/DECREF macros that don't check for the immortality bit. Yes, it will break COW, but nevertheless we have to support the Stable ABI, and INCREF/DECREF are in the Stable ABI. If you want COW you will have to compile such packages from source. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 12:14 PM Kevin Modzelewski <kevmod@gmail.com> wrote:
fwiw Pyston has immortal objects, though with a slightly different goal and thus design [1]. I'm not necessarily advocating for our design (it makes most sense if there is a JIT involved), but just writing to report our experience of making a change like this and the compatibility effects.
Thanks!
Importantly, our system allows for the reference count of immortal objects to change, as long as it doesn't go below half of the original very-high value. So extension code with no concept of immortality will still update the reference counts of immortal objects, but this is fine. Because of this we haven't seen any issues with extension modules.
As Guido noted, we are taking a similar approach for the sake of older extensions built with the limited API. As a precaution, we start the refcount for immortal objects basically at _Py_IMMORTAL_REFCNT * 1.5. Then we only need to check the high bit of _Py_IMMORTAL_REFCNT to see if an object is immortal.
Right, this is less of an issue for us since normally we do not change the refcount of immortal objects. Also, CPython's test suite keeps us honest about leaking references and memory blocks. :)
For Pyston we've simply disabled these tests, figuring that our users still have CPython to test on. Personally I consider this breakage to be small, but I hadn't seen anyone mention the potential usage of sys.getrefcount() so I thought I'd bring it up.
Thanks again for that.
[1] Our goal is to entirely remove refcounting operations when we can prove we are operating on an immortal object. We can prove it in a couple cases: sometimes simply, such as in Py_RETURN_NONE, but mostly our JIT will often know the immortality of objects it embeds into the code. So if we can prove statically that an object is immortal then we elide the incref/decrefs, and if we can't then we use an unmodified Py_INCREF/Py_DECREF. This means that our reference counts on immortal objects will change, so we detect immortality by checking if the reference count is at least half of the original very-high value.
FWIW, we anticipate that we can take a similar approach in CPython's eval loop, specializing for immortal objects. We are also updating Py_RETURN_NONE, etc. to stop incref'ing. -eric
![](https://secure.gravatar.com/avatar/d6b9415353e04ffa6de5a8f3aaea0553.jpg?s=120&d=mm&r=g)
On 2/15/2022 7:10 PM, Eric Snow wrote:
* the naive implementation shows a 4% slowdown
Without understanding all the benefits, this seems a bit too much for me. 2% would be much better.
* we have a number of strategies that should reduce that penalty
I would like to see that before approving the PEP.
* without immortal objects, the implementation for per-interpreter GIL will require a number of non-trivial workarounds
To me, that says to speed up immortality first.
That last one is particularly meaningful to me since it means we would definitely miss the 3.11 feature freeze.
3 1/2 months from now.
With immortal objects, 3.11 would still be in reach.
Is it worth trying to rush it a bit? -- Terry Jan Reedy
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 2:41 PM Terry Reedy <tjreedy@udel.edu> wrote:
Yeah, we consider 4% to be too much. 2% would be great. Performance-neutral would be even better, of course. :)
* we have a number of strategies that should reduce that penalty
I would like to see that before approving the PEP.
I expect it would be enough to show where things stand with benchmark results. It did not seem like the actual mitigation strategies were as important, so I opted to leave them out to avoid clutter. Plus it isn't clear yet what approaches will help the most, nor how much we can win back. So I didn't want to distract with hypotheticals. If it's important I can add that in.
Agreed.
I'd rather not rush this. I'm saying that, for per-interpreter GIL, 3.11 is within reach without rushing if we have immortal objects. Without them, 3.11 is realistic without rushing things. -eric
![](https://secure.gravatar.com/avatar/d91ce240d2445584e295b5406d12df70.jpg?s=120&d=mm&r=g)
I suggest being a little more explicit (even blatant) that the particular details of: (1) which subset of functionally immortal objects are marked as immortal (2) how to mark something as immortal (3) how to recognize something as immortal (4) which memory-management activities are skipped or modified for immortal objects are not only Cpython-specific, but are also private implementation details that are expected to change in subsequent versions. Ideally, things like the interned string dictionary or the constants from a pyc file will be not merely immortal, but stored in an immortal-only memory page, so that they won't be flushed or CoW-ed when a nearby non-immortal object is modified. Getting those details right will make a difference to performance, and you don't want to be locked in to the first draft. -jJ
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Wed, Feb 16, 2022 at 10:43 PM Jim J. Jewett <jimjjewett@gmail.com> wrote:
Excellent point.
Ideally, things like the interned string dictionary or the constants from a pyc file will be not merely immortal, but stored in an immortal-only memory page, so that they won't be flushed or CoW-ed when a nearby non-immortal object is modified.
That's definitely worth looking into.
Getting those details right will make a difference to performance, and you don't want to be locked in to the first draft.
Yep, that is one big reason I was trying to avoid spelling out every detail of our plan. :) -eric
participants (11)
-
Antoine Pitrou
-
Chris Angelico
-
Eddie Elizondo
-
Eric Snow
-
Guido van Rossum
-
Inada Naoki
-
Jim J. Jewett
-
Kevin Modzelewski
-
Larry Hastings
-
Petr Viktorin
-
Terry Reedy