PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 2)
Thanks to all those that provided feedback. I've worked to
substantially update the PEP in response. The text is included below.
Further feedback is appreciated.
-eric
------------------------
PEP: 683
Title: Immortal Objects, Using a Fixed Refcount
Author: Eric Snow
Hi,
I hope per-interpreter GIL success at some point, and I know this is
needed for per-interpreter GIL.
But I am worrying about per-interpreter GIL may be too complex to
implement and maintain for core developers and extension writers.
As you know, immortal don't mean sharable between interpreters. It is
too difficult to know which object can be shared, and where the
shareable objects are leaked to other interpreters.
So I am not sure that per interpreter GIL is achievable goal.
So I think it's too early to introduce the immortal objects in Python
3.11, unless it *improve* performance without per-interpreter GIL
Instead, we can add a configuration option such as
`--enalbe-experimental-immortal`.
On Sat, Feb 19, 2022 at 4:52 PM Eric Snow
Reducing CPU Cache Invalidation -------------------------------
Avoiding Data Races -------------------
Both benefits require a per-interpreter GIL.
Avoiding Copy-on-Write ----------------------
For some applications it makes sense to get the application into a desired initial state and then fork the process for each worker. This can result in a large performance improvement, especially memory usage. Several enterprise Python users (e.g. Instagram, YouTube) have taken advantage of this. However, the above refcount semantics drastically reduce the benefits and has led to some sub-optimal workarounds.
As I wrote before, fork is very difficult to use safely. We can not recommend to use it for many users. And I don't think reducing the size of patch in Instagram or YouTube is not good rational for this kind of change.
Also note that "fork" isn't the only operating system mechanism that uses copy-on-write semantics. Anything that uses ``mmap`` relies on copy-on-write, including sharing data from shared objects files between processes.
It is very difficult to reduce CoW with mmap(MAP_PRIVATE). You may need to write hash of bytes and unicode. You may be need to write `tp_type`. Immortal objects can "reduce" the memory write. But "at least one memory write" is enough to trigger the CoW.
Accidental Immortality ----------------------
While it isn't impossible, this accidental scenario is so unlikely that we need not worry. Even if done deliberately by using ``Py_INCREF()`` in a tight loop and each iteration only took 1 CPU cycle, it would take 2^61 cycles (on a 64-bit processor). At a fast 5 GHz that would still take nearly 500,000,000 seconds (over 5,000 days)! If that CPU were 32-bit then it is (technically) more possible though still highly unlikely.
Technically, `[obj] * (2**(32-4))` is 1GB array on 32bit.
Constraints -----------
* ensure that otherwise immutable objects can be truly immutable * be careful when immortalizing objects that are not otherwise immutable
I am not sure about what this means. For example, unicode objects are not immutable because they have hash, utf8 cache and wchar_t cache. (wchar_t cache will be removed in Python 3.12).
Object Cleanup --------------
In order to clean up all immortal objects during runtime finalization, we must keep track of them.
I don't think we need to clean up all immortal objects.
Of course, we should care immortal by default objects.
But for user-marked immortal objects, it's very difficult to guarantee
__del__ or weakref callback is called safely.
Additionally, if they are marked immortal for avoiding CoW, cleanup cause CoW.
Regards,
--
Inada Naoki
Thanks for the feedback. I've responded inline below.
-eric
On Sat, Feb 19, 2022 at 8:50 PM Inada Naoki
I hope per-interpreter GIL success at some point, and I know this is needed for per-interpreter GIL.
But I am worrying about per-interpreter GIL may be too complex to implement and maintain for core developers and extension writers. As you know, immortal don't mean sharable between interpreters. It is too difficult to know which object can be shared, and where the shareable objects are leaked to other interpreters. So I am not sure that per interpreter GIL is achievable goal.
I plan on addressing this in the PEP I am working on for per-interpreter GIL. In the meantime, I doubt the issue will impact any core devs.
So I think it's too early to introduce the immortal objects in Python 3.11, unless it *improve* performance without per-interpreter GIL Instead, we can add a configuration option such as `--enalbe-experimental-immortal`.
I agree that immortal objects aren't quite as appealing in general without per-interpreter GIL. However, there are actual users that will benefit from it, assuming we can reduce the performance penalty to acceptable levels. For a recent example, see https://mail.python.org/archives/list/python-dev@python.org/message/B77BQQFD....
On Sat, Feb 19, 2022 at 4:52 PM Eric Snow
wrote: Reducing CPU Cache Invalidation -------------------------------
Avoiding Data Races -------------------
Both benefits require a per-interpreter GIL.
CPU cache invalidation exists regardless. With the current GIL the effect it is reduced significantly. Per-interpreter GIL is only one situation where data races matter. Any attempt to generally eliminate the GIL must deal with races on the per-object runtime state.
Avoiding Copy-on-Write ----------------------
For some applications it makes sense to get the application into a desired initial state and then fork the process for each worker. This can result in a large performance improvement, especially memory usage. Several enterprise Python users (e.g. Instagram, YouTube) have taken advantage of this. However, the above refcount semantics drastically reduce the benefits and has led to some sub-optimal workarounds.
As I wrote before, fork is very difficult to use safely. We can not recommend to use it for many users. And I don't think reducing the size of patch in Instagram or YouTube is not good rational for this kind of change.
What do you mean by "this kind of change"? The proposed change is relatively small. It certainly isn't nearly as intrusive as many changes we make to internals without a PEP. If you are talking about the performance penalty, we should be able to eliminate it.
Also note that "fork" isn't the only operating system mechanism that uses copy-on-write semantics. Anything that uses ``mmap`` relies on copy-on-write, including sharing data from shared objects files between processes.
It is very difficult to reduce CoW with mmap(MAP_PRIVATE).
You may need to write hash of bytes and unicode. You may be need to write `tp_type`. Immortal objects can "reduce" the memory write. But "at least one memory write" is enough to trigger the CoW.
Correct. However, without immortal objects (AKA immutable per-object runtime-state) it goes from "very difficult" to "basically impossible".
Accidental Immortality ----------------------
While it isn't impossible, this accidental scenario is so unlikely that we need not worry. Even if done deliberately by using ``Py_INCREF()`` in a tight loop and each iteration only took 1 CPU cycle, it would take 2^61 cycles (on a 64-bit processor). At a fast 5 GHz that would still take nearly 500,000,000 seconds (over 5,000 days)! If that CPU were 32-bit then it is (technically) more possible though still highly unlikely.
Technically, `[obj] * (2**(32-4))` is 1GB array on 32bit.
The question is if this matters. If really necessary, the PEP can demonstrate that it doesn't matter in practice. (Also, the magic value on 32-bit would be 2**29.)
Constraints -----------
* ensure that otherwise immutable objects can be truly immutable * be careful when immortalizing objects that are not otherwise immutable
I am not sure about what this means. For example, unicode objects are not immutable because they have hash, utf8 cache and wchar_t cache. (wchar_t cache will be removed in Python 3.12).
I think you understood it correctly. In the case of str objects, they are close enough since a race on any of those values will not cause a different outcome. I will clarify the point in the PEP.
Object Cleanup --------------
In order to clean up all immortal objects during runtime finalization, we must keep track of them.
I don't think we need to clean up all immortal objects.
Of course, we should care immortal by default objects. But for user-marked immortal objects, it's very difficult to guarantee __del__ or weakref callback is called safely.
There is no such thing as user-marked immortal objects. The concept is strictly an internal one, with no public API.
Additionally, if they are marked immortal for avoiding CoW, cleanup cause CoW.
Correct. The PEP does not propose to deal with that situation.
On Wed, Feb 23, 2022 at 10:12 AM Eric Snow
Thanks for the feedback. I've responded inline below.
-eric
On Sat, Feb 19, 2022 at 8:50 PM Inada Naoki
wrote: I hope per-interpreter GIL success at some point, and I know this is needed for per-interpreter GIL.
But I am worrying about per-interpreter GIL may be too complex to implement and maintain for core developers and extension writers. As you know, immortal don't mean sharable between interpreters. It is too difficult to know which object can be shared, and where the shareable objects are leaked to other interpreters. So I am not sure that per interpreter GIL is achievable goal.
I plan on addressing this in the PEP I am working on for per-interpreter GIL. In the meantime, I doubt the issue will impact any core devs.
It's nice to hear!
So I think it's too early to introduce the immortal objects in Python 3.11, unless it *improve* performance without per-interpreter GIL Instead, we can add a configuration option such as `--enalbe-experimental-immortal`.
I agree that immortal objects aren't quite as appealing in general without per-interpreter GIL. However, there are actual users that will benefit from it, assuming we can reduce the performance penalty to acceptable levels. For a recent example, see https://mail.python.org/archives/list/python-dev@python.org/message/B77BQQFD....
It is not proven example, but just a hope at the moment. So option is fine to prove the idea. Although I can not read the code, they said "patching ASLR by patching `ob_type` fields;". It will cause CoW for most objects, isn't it? So reducing memory write don't directly means reducing CoW. Unless we can stop writing on a page completely, the page will be copied.
On Sat, Feb 19, 2022 at 4:52 PM Eric Snow
wrote: Reducing CPU Cache Invalidation -------------------------------
Avoiding Data Races -------------------
Both benefits require a per-interpreter GIL.
CPU cache invalidation exists regardless. With the current GIL the effect it is reduced significantly.
It's an interesting point. We can not see the benefit from pypeformance, because it doesn't use much data and it runs one process at a time. So the pyperformance can not make enough stress to the last level cache which is shared by many cores. We need multiprocess performance benchmark apart from pyperformance, to stress the last level cache from multiple cores. It helps not only this PEP, but also optimizing containers like dict and set.
As I wrote before, fork is very difficult to use safely. We can not recommend to use it for many users. And I don't think reducing the size of patch in Instagram or YouTube is not good rational for this kind of change.
What do you mean by "this kind of change"? The proposed change is relatively small. It certainly isn't nearly as intrusive as many changes we make to internals without a PEP. If you are talking about the performance penalty, we should be able to eliminate it.
Can proposed optimizations to eliminate the penalty guarantee that every __del__, weakref are not broken, and no memory leak occurs when the Python interpreter is initialized and finalized multiple times? I haven't confirmed it yet.
Also note that "fork" isn't the only operating system mechanism that uses copy-on-write semantics. Anything that uses ``mmap`` relies on copy-on-write, including sharing data from shared objects files between processes.
It is very difficult to reduce CoW with mmap(MAP_PRIVATE).
You may need to write hash of bytes and unicode. You may be need to write `tp_type`. Immortal objects can "reduce" the memory write. But "at least one memory write" is enough to trigger the CoW.
Correct. However, without immortal objects (AKA immutable per-object runtime-state) it goes from "very difficult" to "basically impossible".
Configuration option won't make it impossible.
Constraints -----------
* ensure that otherwise immutable objects can be truly immutable * be careful when immortalizing objects that are not otherwise immutable
I am not sure about what this means. For example, unicode objects are not immutable because they have hash, utf8 cache and wchar_t cache. (wchar_t cache will be removed in Python 3.12).
I think you understood it correctly. In the case of str objects, they are close enough since a race on any of those values will not cause a different outcome.
I will clarify the point in the PEP.
FWIW, I filed an issue to remove hash cache from bytes objects.
https://github.com/faster-cpython/ideas/issues/290
Code objects have many bytes objects, (e.g. co_code, co_linetable, etc...)
Removing it will save some RAM usage and make immortal bytes truly
immutable, safe to be shared between interpreters.
--
Inada Naoki
Responses inline below.
-eric
On Tue, Feb 22, 2022 at 7:22 PM Inada Naoki
For a recent example, see https://mail.python.org/archives/list/python-dev@python.org/message/B77BQQFD....
It is not proven example, but just a hope at the moment. So option is fine to prove the idea.
Although I can not read the code, they said "patching ASLR by patching `ob_type` fields;". It will cause CoW for most objects, isn't it?
So reducing memory write don't directly means reducing CoW. Unless we can stop writing on a page completely, the page will be copied.
Yeah, they would have to address that.
CPU cache invalidation exists regardless. With the current GIL the effect it is reduced significantly.
It's an interesting point. We can not see the benefit from pypeformance, because it doesn't use much data and it runs one process at a time. So the pyperformance can not make enough stress to the last level cache which is shared by many cores.
We need multiprocess performance benchmark apart from pyperformance, to stress the last level cache from multiple cores. It helps not only this PEP, but also optimizing containers like dict and set.
+1
Can proposed optimizations to eliminate the penalty guarantee that every __del__, weakref are not broken, and no memory leak occurs when the Python interpreter is initialized and finalized multiple times? I haven't confirmed it yet.
They will not break __del__ or weakrefs. No memory will leak after finalization. If any of that happens then it is a bug.
FWIW, I filed an issue to remove hash cache from bytes objects. https://github.com/faster-cpython/ideas/issues/290
Code objects have many bytes objects, (e.g. co_code, co_linetable, etc...) Removing it will save some RAM usage and make immortal bytes truly immutable, safe to be shared between interpreters.
+1 Thanks!
On 19. 02. 22 8:46, Eric Snow wrote:
Thanks to all those that provided feedback. I've worked to substantially update the PEP in response. The text is included below. Further feedback is appreciated.
Thank you! This version is much clearer. I like the PEP more and more! I've sent a PR with a some typo fixes: https://github.com/python/peps/pull/2348 and I have a few comments: [...]
Public Refcount Details [...] As part of this proposal, we must make sure that users can clearly understand on which parts of the refcount behavior they can rely and which are considered implementation details. Specifically, they should use the existing public refcount-related API and the only refcount value with any meaning is 0. All other values are considered "not 0".
Should we care about hacks/optimizations that rely on having the only reference (or all references), e.g. mutating a tuple if it has refcount 1? Immortal objects shouldn't break them (the special case simply won't apply), but this wording would make them illegal. AFAIK CPython uses this internally, but I don't know how prevalent/useful it is in third-party code. [...]
_Py_IMMORTAL_REFCNT -------------------
We will add two internal constants::
#define _Py_IMMORTAL_BIT (1LL << (8 * sizeof(Py_ssize_t) - 4)) #define _Py_IMMORTAL_REFCNT (_Py_IMMORTAL_BIT + (_Py_IMMORTAL_BIT / 2))
As a nitpick: could you say this in prose? * ``_Py_IMMORTAL_BIT`` has the third top-most bit set. * ``_Py_IMMORTAL_REFCNT`` has the third and fourth top-most bits set. [...]
Immortal Global Objects -----------------------
All objects that we expect to be shared globally (between interpreters) will be made immortal. That includes the following:
* singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``) * all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``) * all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers, small ints)
All such objects will be immutable. In the case of the static types, they will be effectively immutable. ``PyTypeObject`` has some mutable start (``tp_dict`` and ``tp_subclasses``), but we can work around this by storing that state on ``PyInterpreterState`` instead of on the respective static type object. Then the ``__dict__``, etc. getter will do a lookup on the current interpreter, if appropriate, instead of using ``tp_dict``.
But tp_dict is also public C-API. How will that be handled? Perhaps naively, I thought static types' dicts could be treated as (deeply) immutable, and shared? Perhaps it would be best to leave it out here and say say "The details of sharing ``PyTypeObject`` across interpreters are left to another PEP"? Even so, I'd love to know the plan. (And even if these are internals, changes to them should be mentioned in What's New, for the sake of people who need to maintain old extensions.)
Object Cleanup --------------
In order to clean up all immortal objects during runtime finalization, we must keep track of them.
For GC objects ("containers") we'll leverage the GC's permanent generation by pushing all immortalized containers there. During runtime shutdown, the strategy will be to first let the runtime try to do its best effort of deallocating these instances normally. Most of the module deallocation will now be handled by ``pylifecycle.c:finalize_modules()`` which cleans up the remaining modules as best as we can. It will change which modules are available during __del__ but that's already defined as undefined behavior by the docs. Optionally, we could do some topological disorder to guarantee that user modules will be deallocated first before the stdlib modules. Finally, anything leftover (if any) can be found through the permanent generation gc list which we can clear after finalize_modules().
For non-container objects, the tracking approach will vary on a case-by-case basis. In nearly every case, each such object is directly accessible on the runtime state, e.g. in a ``_PyRuntimeState`` or ``PyInterpreterState`` field. We may need to add a tracking mechanism to the runtime state for a small number of objects.
Out of curiosity: How does this extra work affect in the performance? Is it part of the 4% slowdown? And from the other thread: On 17. 02. 22 18:23, Eric Snow wrote:
On Thu, Feb 17, 2022 at 3:42 AM Petr Viktorin
wrote: Weren't you planning a PEP on subinterpreter GIL as well? Do you want to submit them together?
I'd have to think about that. The other PEP I'm writing for per-interpreter GIL doesn't require immortal objects. They just simplify a number of things. That's my motivation for writing this PEP, in fact. :)
Please think about it. If you removed the benefits for per-interpreter GIL, the motivation section would be reduced to is memory savings for fork/CoW. (And lots of performance improvements that are great in theory but sum up to a 4% loss.)
Sounds good. Would this involve more than a note at the top of the PEP?
No, a note would work great. If you read the motivation carefully, it's (IMO) clear that it's rather weak without the other PEP. But that realization shouldn't come as a surprise to the reader.
And just to be clear, I don't think the fate of a per-interpreter GIL PEP should not depend on this one.
I think that's clear. It's other way around - the fate of this PEP will probably depend on the per-interpreter GIL one.
Petr Viktorin wrote:
Should we care about hacks/optimizations that rely on having the only reference (or all references), e.g. mutating a tuple if it has refcount 1? Immortal objects shouldn't break them (the special case simply won't apply), but this wording would make them illegal. AFAIK CPython uses this internally, but I don't know how prevalent/useful it is in third-party code.
For what it's worth Cython does this for string concatenation to concatenate in place if possible (this optimization was copied from CPython). It could be disabled relatively easily if it became a problem (it's already CPython only and version checked so it'd just need another upper-bound version check).
On Mon, Feb 21, 2022 at 10:56 AM
For what it's worth Cython does this for string concatenation to concatenate in place if possible (this optimization was copied from CPython). It could be disabled relatively easily if it became a problem (it's already CPython only and version checked so it'd just need another upper-bound version check).
That's good to know. -eric
On 2/21/2022 11:11 AM, Petr Viktorin wrote:
On 19. 02. 22 8:46, Eric Snow wrote:
As part of this proposal, we must make sure that users can clearly understand on which parts of the refcount behavior they can rely and which are considered implementation details. Specifically, they should use the existing public refcount-related API and the only refcount value with any meaning is 0. All other values are considered "not 0".
Should we care about hacks/optimizations that rely on having the only reference (or all references), e.g. mutating a tuple if it has refcount 1? Immortal objects shouldn't break them (the special case simply won't apply), but this wording would make them illegal. AFAIK CPython uses this internally, but I don't know how prevalent/useful it is in third-party code.
We could say that the only refcounts with any meaning are 0, 1, and > 1. -- Terry Jan Reedy
Thanks for the responses. I've replied inline below.
-eric
On Mon, Feb 21, 2022 at 9:11 AM Petr Viktorin
On 19. 02. 22 8:46, Eric Snow wrote:
Thanks to all those that provided feedback. I've worked to substantially update the PEP in response. The text is included below. Further feedback is appreciated.
Thank you! This version is much clearer. I like the PEP more and more!
Great!
I've sent a PR with a some typo fixes: https://github.com/python/peps/pull/2348
Thank you.
Public Refcount Details [...] As part of this proposal, we must make sure that users can clearly understand on which parts of the refcount behavior they can rely and which are considered implementation details. Specifically, they should use the existing public refcount-related API and the only refcount value with any meaning is 0. All other values are considered "not 0".
Should we care about hacks/optimizations that rely on having the only reference (or all references), e.g. mutating a tuple if it has refcount 1? Immortal objects shouldn't break them (the special case simply won't apply), but this wording would make them illegal. AFAIK CPython uses this internally, but I don't know how prevalent/useful it is in third-party code.
Good point. As Terry suggested, we could also let 1 have meaning. Regardless, any documented restriction would only apply to users of the public C-API, not to internal code.
_Py_IMMORTAL_REFCNT -------------------
We will add two internal constants::
#define _Py_IMMORTAL_BIT (1LL << (8 * sizeof(Py_ssize_t) - 4)) #define _Py_IMMORTAL_REFCNT (_Py_IMMORTAL_BIT + (_Py_IMMORTAL_BIT / 2))
As a nitpick: could you say this in prose?
* ``_Py_IMMORTAL_BIT`` has the third top-most bit set. * ``_Py_IMMORTAL_REFCNT`` has the third and fourth top-most bits set.
Sure.
Immortal Global Objects -----------------------
All objects that we expect to be shared globally (between interpreters) will be made immortal. That includes the following:
* singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``) * all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``) * all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers, small ints)
All such objects will be immutable. In the case of the static types, they will be effectively immutable. ``PyTypeObject`` has some mutable start (``tp_dict`` and ``tp_subclasses``), but we can work around this by storing that state on ``PyInterpreterState`` instead of on the respective static type object. Then the ``__dict__``, etc. getter will do a lookup on the current interpreter, if appropriate, instead of using ``tp_dict``.
But tp_dict is also public C-API. How will that be handled? Perhaps naively, I thought static types' dicts could be treated as (deeply) immutable, and shared?
They are immutable from Python code but not from C (due to tp_dict). Basically, we will document that tp_dict should not be used directly (in the public API) and refer users to a public getter function. I'll note this in the PEP.
Perhaps it would be best to leave it out here and say say "The details of sharing ``PyTypeObject`` across interpreters are left to another PEP"? Even so, I'd love to know the plan.
What else would you like to know? There isn't much to it. For each of the builtin static types we will keep the relevant mutable state on PyInterpreterState and look it up there in the relevant getters (e.g. __dict__ and __subclasses__).
(And even if these are internals, changes to them should be mentioned in What's New, for the sake of people who need to maintain old extensions.)
+1
Object Cleanup --------------
In order to clean up all immortal objects during runtime finalization, we must keep track of them.
For GC objects ("containers") we'll leverage the GC's permanent generation by pushing all immortalized containers there. During runtime shutdown, the strategy will be to first let the runtime try to do its best effort of deallocating these instances normally. Most of the module deallocation will now be handled by ``pylifecycle.c:finalize_modules()`` which cleans up the remaining modules as best as we can. It will change which modules are available during __del__ but that's already defined as undefined behavior by the docs. Optionally, we could do some topological disorder to guarantee that user modules will be deallocated first before the stdlib modules. Finally, anything leftover (if any) can be found through the permanent generation gc list which we can clear after finalize_modules().
For non-container objects, the tracking approach will vary on a case-by-case basis. In nearly every case, each such object is directly accessible on the runtime state, e.g. in a ``_PyRuntimeState`` or ``PyInterpreterState`` field. We may need to add a tracking mechanism to the runtime state for a small number of objects.
Out of curiosity: How does this extra work affect in the performance? Is it part of the 4% slowdown?
The slowdown is exclusively due to the change to Py_INCREF() and Py_DECREF(). If there are any objects that must be specially tracked, that will have insignificant performance impact.
And from the other thread:
On 17. 02. 22 18:23, Eric Snow wrote:
On Thu, Feb 17, 2022 at 3:42 AM Petr Viktorin
wrote: Weren't you planning a PEP on subinterpreter GIL as well? Do you want to submit them together?
I'd have to think about that. The other PEP I'm writing for per-interpreter GIL doesn't require immortal objects. They just simplify a number of things. That's my motivation for writing this PEP, in fact. :)
Please think about it. If you removed the benefits for per-interpreter GIL, the motivation section would be reduced to is memory savings for fork/CoW. (And lots of performance improvements that are great in theory but sum up to a 4% loss.)
Sounds good. Would this involve more than a note at the top of the PEP?
No, a note would work great. If you read the motivation carefully, it's (IMO) clear that it's rather weak without the other PEP. But that realization shouldn't come as a surprise to the reader.
Having thought about it some more, I don't think this PEP should be strictly bound to per-interpreter GIL. That is certainly my personal motivation. However, we have a small set of users that would benefit significantly, the change is relatively small and simple, and the risk of breaking users is also small. In fact, we regularly have more disruptive changes to internals that do not require a PEP. So it seems like the bar should be pretty low for this one (assuming we get the performance penalty low enough). If it were some massive or broadly impactful (or even clearly public) change then I suppose you could call the motivation weak. However, this isn't that sort of PEP. Honestly, it might not have needed a PEP in the first place if I had been a bit more clear about the idea earlier.
On 23. 02. 22 2:46, Eric Snow wrote:
Thanks for the responses. I've replied inline below.
Same here :)
Immortal Global Objects -----------------------
All objects that we expect to be shared globally (between interpreters) will be made immortal. That includes the following:
* singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``) * all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``) * all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers, small ints)
All such objects will be immutable. In the case of the static types, they will be effectively immutable. ``PyTypeObject`` has some mutable start (``tp_dict`` and ``tp_subclasses``), but we can work around this by storing that state on ``PyInterpreterState`` instead of on the respective static type object. Then the ``__dict__``, etc. getter will do a lookup on the current interpreter, if appropriate, instead of using ``tp_dict``.
But tp_dict is also public C-API. How will that be handled? Perhaps naively, I thought static types' dicts could be treated as (deeply) immutable, and shared?
They are immutable from Python code but not from C (due to tp_dict). Basically, we will document that tp_dict should not be used directly (in the public API) and refer users to a public getter function. I'll note this in the PEP.
What worries me is that existing users of the API haven't read the new documentation. What will happen if users do use it? Or worse, add things to it? (Hm, the current docs are already rather confusing -- 3.2 added a note that "It is not safe to ... modify tp_dict with the dictionary C-API.", but above that it says "extra attributes for the type may be added to this dictionary [in some cases]") [...]
And from the other thread:
On 17. 02. 22 18:23, Eric Snow wrote:
On Thu, Feb 17, 2022 at 3:42 AM Petr Viktorin
wrote: Weren't you planning a PEP on subinterpreter GIL as well? Do you want to submit them together?
I'd have to think about that. The other PEP I'm writing for per-interpreter GIL doesn't require immortal objects. They just simplify a number of things. That's my motivation for writing this PEP, in fact. :)
Please think about it. If you removed the benefits for per-interpreter GIL, the motivation section would be reduced to is memory savings for fork/CoW. (And lots of performance improvements that are great in theory but sum up to a 4% loss.)
Sounds good. Would this involve more than a note at the top of the PEP?
No, a note would work great. If you read the motivation carefully, it's (IMO) clear that it's rather weak without the other PEP. But that realization shouldn't come as a surprise to the reader.
Having thought about it some more, I don't think this PEP should be strictly bound to per-interpreter GIL. That is certainly my personal motivation. However, we have a small set of users that would benefit significantly, the change is relatively small and simple, and the risk of breaking users is also small. In fact, we regularly have more disruptive changes to internals that do not require a PEP.
Right, with the recent performance improvements it's looking like it might stand on its own after all.
So it seems like the bar should be pretty low for this one (assuming we get the performance penalty low enough). If it were some massive or broadly impactful (or even clearly public) change then I suppose you could call the motivation weak. However, this isn't that sort of PEP. Honestly, it might not have needed a PEP in the first place if I had been a bit more clear about the idea earlier.
Maybe it's good to have a PEP to clear that up :)
On Wed, Feb 23, 2022 at 8:19 AM Petr Viktorin
On 23. 02. 22 2:46, Eric Snow wrote:
[SNIP]
So it seems like the bar should be pretty low for this one (assuming we get the performance penalty low enough). If it were some massive or broadly impactful (or even clearly public) change then I suppose you could call the motivation weak. However, this isn't that sort of PEP.
Yes, but PEPs are not just about complexity, but also impact on users. And "impact" covers backwards-compatibility which includes performance regressions (i.e. making Python slower means it may no longer be a viable for someone with specific performance requirements). So with the initial 4% performance regression it made sense to write a PEP.
On Wed, Feb 23, 2022 at 9:16 AM Petr Viktorin
But tp_dict is also public C-API. How will that be handled? Perhaps naively, I thought static types' dicts could be treated as (deeply) immutable, and shared?
They are immutable from Python code but not from C (due to tp_dict). Basically, we will document that tp_dict should not be used directly (in the public API) and refer users to a public getter function. I'll note this in the PEP.
What worries me is that existing users of the API haven't read the new documentation. What will happen if users do use it? Or worse, add things to it?
We will probably set it to NULL, so the user code would fail or crash. I suppose we could set it to a dummy object that emits helpful errors. However, I don't think that is worth it. We're talking about where users are directly accessing tp_dict of the builtin static types, not their own. That is already something they should definitely not be doing.
(Hm, the current docs are already rather confusing -- 3.2 added a note that "It is not safe to ... modify tp_dict with the dictionary C-API.", but above that it says "extra attributes for the type may be added to this dictionary [in some cases]")
Yeah, the docs will have to be clarified.
Having thought about it some more, I don't think this PEP should be strictly bound to per-interpreter GIL. That is certainly my personal motivation. However, we have a small set of users that would benefit significantly, the change is relatively small and simple, and the risk of breaking users is also small.
Right, with the recent performance improvements it's looking like it might stand on its own after all.
Great!
Honestly, it might not have needed a PEP in the first place if I had been a bit more clear about the idea earlier.
Maybe it's good to have a PEP to clear that up :)
Yeah, the PEP process has been helpful for that. :) -eric
On Mon, Feb 21, 2022 at 5:18 PM Petr Viktorin
reference (or all references), e.g. mutating a tuple if it has refcount 1? Immortal objects shouldn't break them (the special case simply won't apply), but this wording would make them illegal. AFAIK CPython uses this internally, but I don't know how prevalent/useful it is in third-party code.
FWIW, a real world example of this is numpy.ndarray.resize(..., refcheck=True): https://numpy.org/doc/stable/reference/generated/numpy.ndarray.resize.html#n... https://github.com/numpy/numpy/blob/main/numpy/core/src/multiarray/shape.c#L... When refcheck=True (the default), numpy raises an error if you try to resize an array inplace whose refcnt > 2 (although I don't understand why > 2 and not > 1, and the docs aren't very clear about this). That said, relying on the exact value of the refcnt is very bad for alternative implementations and for HPy, and in particular it is impossible to implement ndarray.resize(refcheck=True) correctly on PyPy. So from this point of view, a wording which explicitly restricts the "legal" usage of the refcnt details would be very welcome.
On Thu, 2022-02-24 at 00:21 +0100, Antonio Cuni wrote:
On Mon, Feb 21, 2022 at 5:18 PM Petr Viktorin
wrote: Should we care about hacks/optimizations that rely on having the only
reference (or all references), e.g. mutating a tuple if it has refcount 1? Immortal objects shouldn't break them (the special case simply won't apply), but this wording would make them illegal. AFAIK CPython uses this internally, but I don't know how prevalent/useful it is in third-party code.
FWIW, a real world example of this is numpy.ndarray.resize(..., refcheck=True): https://numpy.org/doc/stable/reference/generated/numpy.ndarray.resize.html#n... https://github.com/numpy/numpy/blob/main/numpy/core/src/multiarray/shape.c#L...
When refcheck=True (the default), numpy raises an error if you try to resize an array inplace whose refcnt > 2 (although I don't understand why > 2 and not > 1, and the docs aren't very clear about this).
That said, relying on the exact value of the refcnt is very bad for alternative implementations and for HPy, and in particular it is impossible to implement ndarray.resize(refcheck=True) correctly on PyPy. So from this point of view, a wording which explicitly restricts the "legal" usage of the refcnt details would be very welcome.
Yeah, NumPy resizing is a bit of an awkward point, I would be on-board for just replacing resize for non NumPy does also have a bit of magic akin to the "string concat" trick for operations like: a + b + c where it will try do magic and use the knowledge that it can mutate/reuse the temporary array, effectively doing: tmp = a + b tmp += c (which requires some stack walking magic additionally to the refcount!) Cheers, Sebastian
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ACJIER45... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Feb 23, 2022 at 4:21 PM Antonio Cuni
When refcheck=True (the default), numpy raises an error if you try to resize an array inplace whose refcnt > 2 (although I don't understand why > 2 and not > 1, and the docs aren't very clear about this).
That said, relying on the exact value of the refcnt is very bad for alternative implementations and for HPy, and in particular it is impossible to implement ndarray.resize(refcheck=True) correctly on PyPy. So from this point of view, a wording which explicitly restricts the "legal" usage of the refcnt details would be very welcome.
Thanks for the feedback and example. It helps. -eric
On Sat, Feb 19, 2022 at 12:46 AM Eric Snow
Performance -----------
A naive implementation shows `a 4% slowdown`_. Several promising mitigation strategies will be pursued in the effort to bring it closer to performance-neutral. See the `mitigation`_ section below.
FYI, Eddie has been able to get us back to performance-neutral after applying several of the mitigation strategies we discussed. :) -eric
On 2/22/22 6:00 PM, Eric Snow wrote:
On Sat, Feb 19, 2022 at 12:46 AM Eric Snow
wrote: Performance -----------
A naive implementation shows `a 4% slowdown`_. Several promising mitigation strategies will be pursued in the effort to bring it closer to performance-neutral. See the `mitigation`_ section below. FYI, Eddie has been able to get us back to performance-neutral after applying several of the mitigation strategies we discussed. :)
Are these optimizations specifically for the PR, or are these optimizations we could apply without taking the immortal objects? Kind of like how Sam tried to offset the nogil slowdown by adding optimizations that we went ahead and added anyway ;-) //arry/
On Tue, Feb 22, 2022, 20:26 Larry Hastings
Are these optimizations specifically for the PR, or are these optimizations we could apply without taking the immortal objects? Kind of like how Sam tried to offset the nogil slowdown by adding optimizations that we went ahead and added anyway ;-)
Basically all the optimizations require immortal objects. -eric
participants (9)
-
Antonio Cuni
-
Brett Cannon
-
dw-git@d-woods.co.uk
-
Eric Snow
-
Inada Naoki
-
Larry Hastings
-
Petr Viktorin
-
Sebastian Berg
-
Terry Reedy