"immortal" objects and how they would help per-interpreter GIL
Most of the work toward interpreter isolation and a per-interpreter GIL involves moving static global variables to _PyRuntimeState or PyInterpreterState (or module state). Through the effort of quite a few people, we've made good progress. However, many globals still remain, with the majority being objects and most of those being static strings (e.g. _Py_Identifier), static types (incl. exceptions), and singletons. On top of that, a number of those objects are exposed in the public C-API and even in the limited API. :( Dealing with this specifically is probably the trickiest thing I've had to work through in this project. There is one solution that would help both of the above in a nice way: "immortal" objects. The idea of objects that never get deallocated isn't new and has been explored here several times. Not that long ago I tried it out by setting the refcount really high. That worked. Around the same time Eddie Elizondo at Facebook did something similar but modified Py_INCREF() and Py_DECREF() to keep the refcount from changing. Our solutions were similar but with different goals in mind. (Facebook wants to avoid copy-on-write in their pre-fork model.) A while back I concluded that neither approach would work for us. The approach I had taken would have significant cache performance penalties in a per-interpreter GIL world. The approach that modifies Py_INCREF() has a significant performance penalty due to the extra branch on such a frequent operation. Recently I've come back to the idea of immortal objects because it's much simpler than the alternate (working) solution I found. So how do we get around that performance penalty? Let's say it makes CPython 5% slower. We have some options: * live with the full penalty * make other changes to reduce the penalty to a more acceptable threshold than 5% * eliminate the penalty (e.g. claw back 5% elsewhere) * abandon all hope Mark Shannon suggested to me some things we can do. Also, from a recent conversation with Dino Viehland it sounds like Eddie was able to reach performance-neutral with a few techniques. So here are some things we can do to reduce or eliminate that penalty: * reduce refcount operations on high-activity objects (e.g. None, True, False) * reduce refcount operations in general * walk the heap at the end of runtime initialization and mark all objects as immortal * mark all global objects as immortal (statics or in _PyRuntimeState; for PyInterpreterState not needed) What do you think? Does this sound realistic? Are there additional things we can do to counter that penalty? -eric
On Tue, Dec 14, 2021 at 10:23 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
Most of the work toward interpreter isolation and a per-interpreter GIL involves moving static global variables to _PyRuntimeState or PyInterpreterState (or module state). Through the effort of quite a few people, we've made good progress. However, many globals still remain, with the majority being objects and most of those being static strings (e.g. _Py_Identifier), static types (incl. exceptions), and singletons.
On top of that, a number of those objects are exposed in the public C-API and even in the limited API. :( Dealing with this specifically is probably the trickiest thing I've had to work through in this project.
There is one solution that would help both of the above in a nice way: "immortal" objects.
The idea of objects that never get deallocated isn't new and has been explored here several times. Not that long ago I tried it out by setting the refcount really high. That worked. Around the same time Eddie Elizondo at Facebook did something similar but modified Py_INCREF() and Py_DECREF() to keep the refcount from changing. Our solutions were similar but with different goals in mind. (Facebook wants to avoid copy-on-write in their pre-fork model.)
A while back I concluded that neither approach would work for us. The approach I had taken would have significant cache performance penalties in a per-interpreter GIL world. The approach that modifies Py_INCREF() has a significant performance penalty due to the extra branch on such a frequent operation.
Recently I've come back to the idea of immortal objects because it's much simpler than the alternate (working) solution I found. So how do we get around that performance penalty? Let's say it makes CPython 5% slower. We have some options:
* live with the full penalty * make other changes to reduce the penalty to a more acceptable threshold than 5% * eliminate the penalty (e.g. claw back 5% elsewhere) * abandon all hope
Mark Shannon suggested to me some things we can do. Also, from a recent conversation with Dino Viehland it sounds like Eddie was able to reach performance-neutral with a few techniques. So here are some things we can do to reduce or eliminate that penalty:
* reduce refcount operations on high-activity objects (e.g. None, True, False) * reduce refcount operations in general * walk the heap at the end of runtime initialization and mark all objects as immortal * mark all global objects as immortal (statics or in _PyRuntimeState; for PyInterpreterState not needed)
What do you think? Does this sound realistic? Are there additional things we can do to counter that penalty?
There's also the concern of memory usage if these immortal objects are never collected. But *which *objects are immortal? You only listed None, True, and False. Otherwise assume/remember I'm management and provide a list and/or link of what would get marked as immortal so we can have an idea of the memory impact.
On Tue, Dec 14, 2021 at 4:09 PM Brett Cannon <brett@python.org> wrote:
There's also the concern of memory usage if these immortal objects are never collected.
But which objects are immortal? You only listed None, True, and False. Otherwise assume/remember I'm management and provide a list and/or link of what would get marked as immortal so we can have an idea of the memory impact.
Pretty much we would mark any object as immortal which would exist for the lifetype of the runtime (or the respective interpreter in some cases). So currently that would include the global singletons (None, True, False, small ints, empty tuple, etc.) and the static types. We would likely also include cached strings (_Py_Identifier, interned, etc.). From another angle: I'm working on static allocation for nearly all the objects currently dynamically allocated during runtime/interpreter init. All of them would be marked immortal. This is similar to the approach taken by Eddie with walking the heap and marking all objects found. -eric
How common is it to reload a module in production code? It seems like "object created at the module level" (excluding __main__) is at least as good of an heuristic for immortality as "string that meets the syntactic requirements for an identifier". Perhaps also anything created as part of class creation (as opposed to instance initialization). -jJ
On 14/12/2021 19.19, Eric Snow wrote:
A while back I concluded that neither approach would work for us. The approach I had taken would have significant cache performance penalties in a per-interpreter GIL world. The approach that modifies Py_INCREF() has a significant performance penalty due to the extra branch on such a frequent operation.
Would it be possible to write the Py_INCREF() and Py_DECREF() macros in a way that does not depend on branching? For example we could use the highest bit of the ref count as an immutable indicator and do something like ob_refcnt += !(ob_refcnt >> 63) instead of ob_refcnt++ The code performs "ob_refcnt += 1" when the highest bit is not set and "ob_refcnt += 1" when the bit is set. I have neither tested if the approach actually works nor it's performance. Christian
One thing to consider: ideally, inmortal objects should not participate in the GC. There is nothing inheritly wrong if they do but we would need to update the GC (and therefore add more branching in possible hot paths) to deal with these as the algorithm requires the refcount to be exact to correctly compute the cycles. On Wed, 15 Dec 2021, 09:43 Christian Heimes, <christian@python.org> wrote:
On 14/12/2021 19.19, Eric Snow wrote:
A while back I concluded that neither approach would work for us. The approach I had taken would have significant cache performance penalties in a per-interpreter GIL world. The approach that modifies Py_INCREF() has a significant performance penalty due to the extra branch on such a frequent operation.
Would it be possible to write the Py_INCREF() and Py_DECREF() macros in a way that does not depend on branching? For example we could use the highest bit of the ref count as an immutable indicator and do something like
ob_refcnt += !(ob_refcnt >> 63)
instead of
ob_refcnt++
The code performs "ob_refcnt += 1" when the highest bit is not set and "ob_refcnt += 1" when the bit is set. I have neither tested if the approach actually works nor it's performance.
Christian _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TBTHSOI2... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Dec 15, 2021 at 2:50 AM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
One thing to consider: ideally, inmortal objects should not participate in the GC. There is nothing inheritly wrong if they do but we would need to update the GC (and therefore add more branching in possible hot paths) to deal with these as the algorithm requires the refcount to be exact to correctly compute the cycles.
That's a good point. Do static types and the global singletons already opt out of GC participation? -eric
All singletons do, AFAIK. And most static types that I can think of also do, even the empty tuple. On Wed, 15 Dec 2021 at 16:49, Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Wed, Dec 15, 2021 at 2:50 AM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
One thing to consider: ideally, inmortal objects should not participate in the GC. There is nothing inheritly wrong if they do but we would need to update the GC (and therefore add more branching in possible hot paths) to deal with these as the algorithm requires the refcount to be exact to correctly compute the cycles.
That's a good point. Do static types and the global singletons already opt out of GC participation?
-eric
Immortal objects shouldn't be reclaimed by garbage collection, but they still count as potential external roots for non-cyclic liveness. -jJ
On Wed, Dec 15, 2021 at 6:57 PM Jim J. Jewett <jimjjewett@gmail.com> wrote:
Immortal objects shouldn't be reclaimed by garbage collection, but they still count as potential external roots for non-cyclic liveness.
So everything referenced by an immortal object should also be made immortal -- even its type. Hence immortal objects must be immutable. (There's an issue with making types immutable that we need to address though.) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Guido van Rossum wrote:
On Wed, Dec 15, 2021 at 6:57 PM Jim J. Jewett jimjjewett@gmail.com wrote:
Immortal objects shouldn't be reclaimed by garbage collection, but they still count as potential external roots for non-cyclic liveness. So everything referenced by an immortal object should also be made immortal
Why? As long as you can get a list of all immortal objects (and a traversal function from each), this is just an extra step (annoying, but tolerable) that removes a bunch of objects from the pool of potential garbage before you even begin looking for cycles.
-- even its type. Hence immortal objects must be immutable.
This is probably a good idea, since avoiding changes also avoids races and Copy on Write and cache propagation, etc ... but I don't see why it is *needed*, rather than helpful. -jJ
It’s *needed* when multiple interpreters share them. On Thu, Dec 16, 2021 at 14:03 Jim J. Jewett <jimjjewett@gmail.com> wrote:
Guido van Rossum wrote:
On Wed, Dec 15, 2021 at 6:57 PM Jim J. Jewett jimjjewett@gmail.com wrote:
Immortal objects shouldn't be reclaimed by garbage collection, but they still count as potential external roots for non-cyclic liveness. So everything referenced by an immortal object should also be made immortal
Why? As long as you can get a list of all immortal objects (and a traversal function from each), this is just an extra step (annoying, but tolerable) that removes a bunch of objects from the pool of potential garbage before you even begin looking for cycles.
-- even its type. Hence immortal objects must be immutable.
This is probably a good idea, since avoiding changes also avoids races and Copy on Write and cache propagation, etc ... but I don't see why it is *needed*, rather than helpful.
-jJ _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4KY5XSHR... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
Why are Immutability and transitive Immortality needed to share an object across interpreters? Are you assuming that a change in one interpreter should not be seen by others? (Typical case, but not always true.) Or are you saying that there is a technical problem such that a change -- even just to the reference count of a referenced string or something -- would cause data corruption? (If so, could you explain why, or at least point me in the general direction?) -jJ
On Sat, 18 Dec 2021 08:40:57 -0000 "Jim J. Jewett" <jimjjewett@gmail.com> wrote:
Why are Immutability and transitive Immortality needed to share an object across interpreters?
Immutability is a functional requirement: you don't want one interpreter to be able to change the state of another one by mistake. Unlike multi-threading, where shared mutable state is a feature, a multi-interpreter setup is defined by full semantic isolation between interpreters (even if some structures may technically be shared under the hood: e.g. process-wide immortal immutable objects). As for transitive immortality, it is just a necessary effect of immortality: if an object is immortal, by construction all the objects that it references will also be immortal. For example, if you decide that the tuple `("foo", "bar")` is immortal, then the "foo" and "bar" strings will also be *de facto* immortal, even if they are not explicitly marked as such. Regards Antoine.
Are you assuming that a change in one interpreter should not be seen by others? (Typical case, but not always true.)
Or are you saying that there is a technical problem such that a change -- even just to the reference count of a referenced string or something -- would cause data corruption? (If so, could you explain why, or at least point me in the general direction?)
-jJ
On Wed, 15 Dec 2021 10:42:17 +0100 Christian Heimes <christian@python.org> wrote:
On 14/12/2021 19.19, Eric Snow wrote:
A while back I concluded that neither approach would work for us. The approach I had taken would have significant cache performance penalties in a per-interpreter GIL world. The approach that modifies Py_INCREF() has a significant performance penalty due to the extra branch on such a frequent operation.
Would it be possible to write the Py_INCREF() and Py_DECREF() macros in a way that does not depend on branching? For example we could use the highest bit of the ref count as an immutable indicator and do something like
ob_refcnt += !(ob_refcnt >> 63)
instead of
ob_refcnt++
Probably, but that would also issue spurious writes to immortal refcounts from different threads at once, so might end up worse performance-wise. Regards Antoine.
On Wed, Dec 15, 2021 at 2:21 AM Antoine Pitrou <antoine@python.org> wrote:
On Wed, 15 Dec 2021 10:42:17 +0100 Christian Heimes <christian@python.org> wrote:
On 14/12/2021 19.19, Eric Snow wrote:
A while back I concluded that neither approach would work for us. The approach I had taken would have significant cache performance penalties in a per-interpreter GIL world. The approach that modifies Py_INCREF() has a significant performance penalty due to the extra branch on such a frequent operation.
Would it be possible to write the Py_INCREF() and Py_DECREF() macros in a way that does not depend on branching? For example we could use the highest bit of the ref count as an immutable indicator and do something like
ob_refcnt += !(ob_refcnt >> 63)
instead of
ob_refcnt++
Probably, but that would also issue spurious writes to immortal refcounts from different threads at once, so might end up worse performance-wise.
Unless the CPU is clever enough to skip claiming the cacheline in exclusive-mode for a "+= 0". Which I guess is something you'd have to check empirically on every microarch and instruction pattern you care about, because there's no way it's documented. But maybe? CPUs are very smart, except when they aren't. -n -- Nathaniel J. Smith -- https://vorpus.org
On Wed, Dec 15, 2021 at 2:42 AM Christian Heimes <christian@python.org> wrote:
Would it be possible to write the Py_INCREF() and Py_DECREF() macros in a way that does not depend on branching? For example we could use the highest bit of the ref count as an immutable indicator and do something like
As Antoine pointed out, wouldn't that cause too much cache invalidation between threads, especially for None, True, and False. That's the main reason I abandoned my previous effort (https://github.com/ericsnowcurrently/cpython/pull/9). -eric
On Tue, Dec 14, 2021 at 7:27 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
We have some options:
* live with the full penalty * make other changes to reduce the penalty to a more acceptable threshold than 5% * eliminate the penalty (e.g. claw back 5% elsewhere)
The last time I saw a benchmark on immortal object, it was clearly 10% slower overall on the pyperformance benchmark suite. That's a major slowdown.
* abandon all hope
I wrote https://bugs.python.org/issue39511 and https://github.com/python/cpython/pull/18301 to have per-interpreter None, True and False singletons. My change is backward compatible on the C API: you can still use "Py_None" in your C code. The code gets the singleton object from the current interpreter with a function call: #define Py_None Py_GetNone() Py_GetNone() is implemented as: "return _PyInterpreterState_GET()->none;" If _PyInterpreterState_GET() is modified to read a thread-local state, similar to the on-going work to get the Python thread state from a thread-local variable, Py_GetNone() should be "cheap" but I didn't run a benchmark. While I was working on this issue, I was fighting against other challenges caused by subinterpreters. I fixed some of them since that time. By the way, I made the _Py_IDENTIFIER() API and _PyUnicode_FromId() compatible with subinterpreters in Python 3.10. This change caused a subtle regression when using subintepreters (because an optimization made on an assumption on interned strings which is no longer true). The fix is trivial but I didn't wrote it yet: https://bugs.python.org/issue46006 Victor -- Night gathers, and now my watch begins. It shall not end until my death.
It might be worth (re)reviewing Sam Gross's nogil effort to see how he approached this: https://github.com/colesbury/nogil#readme He goes into plenty of detail in his design document about how he deals with immortal objects. From that document: Some objects, such as interned strings, small integers, statically allocated PyTypeObjects, and the True, False, and None objects stay alive for the lifetime of the program. These objects are marked as immortal by setting the least-significant bit of the local reference count field (bit 0). The Py_INCREF and Py_DECREF macros are no-ops for these objects. Skip
On Wed, Dec 15, 2021 at 4:03 AM Victor Stinner <vstinner@python.org> wrote:
The last time I saw a benchmark on immortal object, it was clearly 10% slower overall on the pyperformance benchmark suite. That's a major slowdown.
Yes, I plan on benchmarking the change as soon as we can run pyperformance on main.
* abandon all hope
I wrote https://bugs.python.org/issue39511 and https://github.com/python/cpython/pull/18301 to have per-interpreter None, True and False singletons.
Yeah, I took a similar approach in the alternative to immortal objects that I prototyped.
By the way, I made the _Py_IDENTIFIER() API and _PyUnicode_FromId() compatible with subinterpreters in Python 3.10. This change caused a subtle regression when using subintepreters (because an optimization made on an assumption on interned strings which is no longer true). The fix is trivial but I didn't wrote it yet: https://bugs.python.org/issue46006
FYI, I'm looking into statically allocating (and initializing) all the string objects currently using _Py_IDENTIFIER(). -eric
On Wed, Dec 15, 2021 at 10:15 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
Yes, I plan on benchmarking the change as soon as we can run pyperformance on main.
I just ran the benchmarks and the PR makes CPython 4% slower. See https://github.com/python/cpython/pull/19474#issuecomment-1032944709. -eric
On Wed, Dec 15, 2021 at 3:07 AM Victor Stinner <vstinner@python.org> wrote:
I wrote https://bugs.python.org/issue39511 and https://github.com/python/cpython/pull/18301 to have per-interpreter None, True and False singletons. My change is backward compatible on the C API: you can still use "Py_None" in your C code. The code gets the singleton object from the current interpreter with a function call:
#define Py_None Py_GetNone()
Py_GetNone() is implemented as: "return _PyInterpreterState_GET()->none;"
It's backward compatible for the C API, but not for the stable C ABI -- that exports Py_None directly as a symbol. You also need a solution for all the static global PyTypeObjects in C extensions. I don't think there's any API-compatible way to make those heap-allocated. -n -- Nathaniel J. Smith -- https://vorpus.org
On Thu, Dec 16, 2021 at 2:29 AM Nathaniel Smith <njs@pobox.com> wrote:
On Wed, Dec 15, 2021 at 3:07 AM Victor Stinner <vstinner@python.org> wrote:
I wrote https://bugs.python.org/issue39511 and https://github.com/python/cpython/pull/18301 to have per-interpreter None, True and False singletons. My change is backward compatible on the C API: you can still use "Py_None" in your C code. The code gets the singleton object from the current interpreter with a function call:
#define Py_None Py_GetNone()
Py_GetNone() is implemented as: "return _PyInterpreterState_GET()->none;"
It's backward compatible for the C API, but not for the stable C ABI -- that exports Py_None directly as a symbol.
You're right. But we can add a macro like Py_SUBINTERPRETER_API which would change the implementation: * By default, "Py_None" would continue returning "&_Py_NoneStruct". * If Py_SUBINTERPRETER_API macro is defined, Py_None would call Py_GetNone(). => no impact on the stable ABI (if used, the stable ABI is not supported) => no impact on performance (if not used) => only C extensions which opt-in for "subinterpreter running in parallel" support (define Py_SUBINTERPRETER_API) would be impacted. Stdlib C extensions would have to be built with Py_SUBINTERPRETER_API, but it is ok if the require a recent ABI since they are shipped with Python directly (and not currently built with the limited C API). Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On 16. 12. 21 11:52, Victor Stinner wrote:
On Thu, Dec 16, 2021 at 2:29 AM Nathaniel Smith <njs@pobox.com> wrote:
On Wed, Dec 15, 2021 at 3:07 AM Victor Stinner <vstinner@python.org> wrote:
I wrote https://bugs.python.org/issue39511 and https://github.com/python/cpython/pull/18301 to have per-interpreter None, True and False singletons. My change is backward compatible on the C API: you can still use "Py_None" in your C code. The code gets the singleton object from the current interpreter with a function call:
#define Py_None Py_GetNone()
Py_GetNone() is implemented as: "return _PyInterpreterState_GET()->none;"
It's backward compatible for the C API, but not for the stable C ABI -- that exports Py_None directly as a symbol.
You're right. But we can add a macro like Py_SUBINTERPRETER_API which would change the implementation:
* By default, "Py_None" would continue returning "&_Py_NoneStruct". * If Py_SUBINTERPRETER_API macro is defined, Py_None would call Py_GetNone().
=> no impact on the stable ABI (if used, the stable ABI is not supported)
The stable ABI could be orthogonal here -- you could compile for the stable ABI even with Py_SUBINTERPRETER_API. This would require (_PyInterpreterState_GET()->none == Py_None) in the "main" interpreter, and extrensions without Py_SUBINTERPRETER_API only loadable in the "main" interpreter.
=> no impact on performance (if not used)
But it *would* be used in all of the stdlib, right?
=> only C extensions which opt-in for "subinterpreter running in parallel" support (define Py_SUBINTERPRETER_API) would be impacted.
Stdlib C extensions would have to be built with Py_SUBINTERPRETER_API, but it is ok if the require a recent ABI since they are shipped with Python directly (and not currently built with the limited C API).
Victor
On Thu, Dec 16, 2021 at 3:08 AM Petr Viktorin <encukou@gmail.com> wrote:
On 16. 12. 21 11:52, Victor Stinner wrote:
On Thu, Dec 16, 2021 at 2:29 AM Nathaniel Smith <njs@pobox.com> wrote:
On Wed, Dec 15, 2021 at 3:07 AM Victor Stinner <vstinner@python.org> wrote:
I wrote https://bugs.python.org/issue39511 and https://github.com/python/cpython/pull/18301 to have per-interpreter None, True and False singletons. My change is backward compatible on the C API: you can still use "Py_None" in your C code. The code gets the singleton object from the current interpreter with a function call:
#define Py_None Py_GetNone()
Py_GetNone() is implemented as: "return _PyInterpreterState_GET()->none;"
It's backward compatible for the C API, but not for the stable C ABI -- that exports Py_None directly as a symbol.
You're right. But we can add a macro like Py_SUBINTERPRETER_API which would change the implementation:
* By default, "Py_None" would continue returning "&_Py_NoneStruct". * If Py_SUBINTERPRETER_API macro is defined, Py_None would call Py_GetNone().
=> no impact on the stable ABI (if used, the stable ABI is not supported)
The stable ABI could be orthogonal here -- you could compile for the stable ABI even with Py_SUBINTERPRETER_API.
This would require (_PyInterpreterState_GET()->none == Py_None) in the "main" interpreter, and extrensions without Py_SUBINTERPRETER_API only loadable in the "main" interpreter.
That would slow things down a bit -- even if the pointer to the interpreter state was already in a register, getting None would require loading the pointer to Py_None from memory by indexing relative to that register. Compared to a plain global, the latter would be a "load constant" instruction, and Eric's other proposal (move the Py_None struct itself into the interpreter state) would be just adding a constant to the register (possibly the fastest solution, since that constant would be much smaller than the full address of the global). Alas, that last version is not compatible with the stable ABI. So I'm still in favor of trying harder to make immortable objects a reality. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Tue, Dec 14, 2021 at 11:19 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
The idea of objects that never get deallocated isn't new and has been explored here several times. Not that long ago I tried it out by setting the refcount really high. That worked. Around the same time Eddie Elizondo at Facebook did something similar but modified Py_INCREF() and Py_DECREF() to keep the refcount from changing. Our solutions were similar but with different goals in mind. (Facebook wants to avoid copy-on-write in their pre-fork model.)
FTR, here are links to the above efforts: * reducing CoW (Instagram): https://bugs.python.org/issue40255 * Eddie's PR: https://github.com/python/cpython/pull/19474 * my PR: https://github.com/python/cpython/pull/24828 * some other discussion: https://github.com/faster-cpython/ideas/issues/14 (I don't have a link to any additional work Eddie did to reduce the performance penalty.) -eric
On Tue, Dec 14, 2021 at 11:19 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
There is one solution that would help both of the above in a nice way: "immortal" objects.
FYI, here are some observations that came up during some discussions with the "faster-cpython" team today: * immortal objects should probably only be immutable ones (other than ob_refcnt, of course) * GC concerns are less of an issue if a really high ref count (bit) is used to identify immortal objects * ob_refcnt is part of the public API (sadly), so using it to mark immortal objects may be sensitive to interference * ob_refcnt is part of the stable ABI (even more sadly), affecting any solution using ref counts * using the ref count isn't the only viable approach; another would be checking the pointer itself + put the object in a specific section of static data and compare the pointer against the bounds + this avoids loading the actual object data if it is immortal + for objects that are mostly treated as markers (e.g. None), this could have a meaningful impact + not compatible with dynamically allocated objects -eric
On Thu, Dec 16, 2021 at 6:03 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
* using the ref count isn't the only viable approach; another would be checking the pointer itself + put the object in a specific section of static data and compare the pointer against the bounds + this avoids loading the actual object data if it is immortal + for objects that are mostly treated as markers (e.g. None), this could have a meaningful impact + not compatible with dynamically allocated objects
Sorry if this is a dumb question, but would it be possible to solve that last point with an immortal arena [1] from which immortal objects could be allocated? None/True/False could be allocated there, but so could anything that is more dynamic, if it's decided as important enough. It would still be possible to recognize them by pointer (since the immortal arena would be a specific block of memory). ChrisA [1] That sounds like something from Norse mythology, actually
On Wed, Dec 15, 2021 at 12:18 PM Chris Angelico <rosuav@gmail.com> wrote:
Sorry if this is a dumb question, but would it be possible to solve that last point with an immortal arena [1] from which immortal objects could be allocated? None/True/False could be allocated there, but so could anything that is more dynamic, if it's decided as important enough. It would still be possible to recognize them by pointer (since the immortal arena would be a specific block of memory).
That's an interesting idea. An immortal arena would certainly be one approach to investigate. However, I'm not convinced there is enough value to justify going out of our way to allow dynamically allocated objects to be immortal. Keep in mind that the concept of immortal objects would probably not be available outside the internal API, and, internally, any objects we want to be immortal will probably be statically allocated. -eric
On Thu, Dec 16, 2021 at 7:03 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Wed, Dec 15, 2021 at 12:18 PM Chris Angelico <rosuav@gmail.com> wrote:
Sorry if this is a dumb question, but would it be possible to solve that last point with an immortal arena [1] from which immortal objects could be allocated? None/True/False could be allocated there, but so could anything that is more dynamic, if it's decided as important enough. It would still be possible to recognize them by pointer (since the immortal arena would be a specific block of memory).
That's an interesting idea. An immortal arena would certainly be one approach to investigate.
However, I'm not convinced there is enough value to justify going out of our way to allow dynamically allocated objects to be immortal. Keep in mind that the concept of immortal objects would probably not be available outside the internal API, and, internally, any objects we want to be immortal will probably be statically allocated.
That makes sense. Thanks. ChrisA
fwiw we added immortal objects to Pyston and haven't run into any issues with it. The goal is a bit different: to eliminate common refcount operations for performance, which we can do a bit more of because we have a jit. And we don't mind if unaware code ends up changing the refcounts of immortal objects since it's no worse for us than before. So anyway maybe it's not super comparable for the issues discussed here, but at least we haven't run into any issues of extension modules being confused by very large reference counts. The one issue we do run into is that quite a few projects will test in debug mode that their c extension doesn't leak reference counts, and that no longer works for us because we don't update Py_RefTotal for immortal objects. kmod On Wed, Dec 15, 2021 at 2:02 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Tue, Dec 14, 2021 at 11:19 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
There is one solution that would help both of the above in a nice way: "immortal" objects.
FYI, here are some observations that came up during some discussions with the "faster-cpython" team today:
* immortal objects should probably only be immutable ones (other than ob_refcnt, of course) * GC concerns are less of an issue if a really high ref count (bit) is used to identify immortal objects * ob_refcnt is part of the public API (sadly), so using it to mark immortal objects may be sensitive to interference * ob_refcnt is part of the stable ABI (even more sadly), affecting any solution using ref counts * using the ref count isn't the only viable approach; another would be checking the pointer itself + put the object in a specific section of static data and compare the pointer against the bounds + this avoids loading the actual object data if it is immortal + for objects that are mostly treated as markers (e.g. None), this could have a meaningful impact + not compatible with dynamically allocated objects
-eric _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LVLFPOIO... Code of Conduct: http://python.org/psf/codeofconduct/
I've just updated the original Immortal Instances PR with a bunch of tricks that I used to achieve as much performance parity as possible: https://github.com/python/cpython/pull/19474 . You can see the details along with some benchmarks in the PR itself. This should address a bunch of the original performance concerns. Furthermore, it opens up the possibility of iterating on top of this to keep improving perf (i.e immortal intern strings, immortal heap types, less gc cycles from moving long lived objects to the perm gen, etc.).
On Fri, Dec 17, 2021 at 11:35:24AM +1300, Greg Ewing wrote:
On 17/12/21 6:52 am, Eddie Elizondo via Python-Dev wrote:
I've just updated the original Immortal Instances PR
Is it just me, or does Immortal Instances sound like a video game franchise?
Or a Doctor Who episode. Doctor Who and the Immortal Instances of Doom. -- Steve
participants (16)
-
Antoine Pitrou
-
Brett Cannon
-
Chris Angelico
-
Christian Heimes
-
Eddie Elizondo
-
Eric Snow
-
Greg Ewing
-
Guido van Rossum
-
Jim J. Jewett
-
Kevin Modzelewski
-
Nathaniel Smith
-
Pablo Galindo Salgado
-
Petr Viktorin
-
Skip Montanaro
-
Steven D'Aprano
-
Victor Stinner