my plans for subinterpreters (and a per-interpreter GIL)
Hi all, I'm still hoping to land a per-interpreter GIL for 3.11. There is still a decent amount of work to be done but little of it will require solving any big problems: * pull remaining static globals into _PyRuntimeState and PyInterpreterState * minor updates to PEP 554 * finish up the last couple pieces of the PEP 554 implementation * maybe publish a companion PEP about per-interpreter GIL There are also a few decisions to be made. I'll open a couple of other threads to get feedback on those. Here I'd like your thoughts on the following: Do we need a PEP about per-interpreter GIL? I haven't thought there would be much value in such a PEP. There doesn't seem to be any decision that needs to be made. At best the PEP would be an explanation of the project, where: * the objective has gotten a lot of support (and we're working on addressing the concerns of the few objectors) * most of the required work is worth doing regardless (e.g. improve runtime init/fini, eliminate static globals) * the performance impact is likely to be a net improvement * it is fully backward compatible and the C-API is essentially unaffected So the value of a PEP would be in consolidating an explanation of the project into a single document. It seems like a poor fit for a PEP. (You might wonder, "what about PEP 554?" I purposefully avoided any discussion of the GIL in PEP 554. It's purpose is to expose subinterpreters to Python code.) However, perhaps I'm too close to it all. I'd like your thoughts on the matter. Thanks! -eric
How did you end up solving the issue where Py_None is a static global that's exposed as part of the stable C ABI? On Tue, Dec 14, 2021 at 9:13 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
Hi all,
I'm still hoping to land a per-interpreter GIL for 3.11. There is still a decent amount of work to be done but little of it will require solving any big problems:
* pull remaining static globals into _PyRuntimeState and PyInterpreterState * minor updates to PEP 554 * finish up the last couple pieces of the PEP 554 implementation * maybe publish a companion PEP about per-interpreter GIL
There are also a few decisions to be made. I'll open a couple of other threads to get feedback on those. Here I'd like your thoughts on the following:
Do we need a PEP about per-interpreter GIL?
I haven't thought there would be much value in such a PEP. There doesn't seem to be any decision that needs to be made. At best the PEP would be an explanation of the project, where:
* the objective has gotten a lot of support (and we're working on addressing the concerns of the few objectors) * most of the required work is worth doing regardless (e.g. improve runtime init/fini, eliminate static globals) * the performance impact is likely to be a net improvement * it is fully backward compatible and the C-API is essentially unaffected
So the value of a PEP would be in consolidating an explanation of the project into a single document. It seems like a poor fit for a PEP.
(You might wonder, "what about PEP 554?" I purposefully avoided any discussion of the GIL in PEP 554. It's purpose is to expose subinterpreters to Python code.)
However, perhaps I'm too close to it all. I'd like your thoughts on the matter.
Thanks!
-eric _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PNLBJBNI... Code of Conduct: http://python.org/psf/codeofconduct/
-- Nathaniel J. Smith -- https://vorpus.org
Whoops, never mind, I see that you started the "immortal objects" thread to discuss this. On Tue, Dec 14, 2021 at 4:54 PM Nathaniel Smith <njs@pobox.com> wrote:
How did you end up solving the issue where Py_None is a static global that's exposed as part of the stable C ABI?
On Tue, Dec 14, 2021 at 9:13 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
Hi all,
I'm still hoping to land a per-interpreter GIL for 3.11. There is still a decent amount of work to be done but little of it will require solving any big problems:
* pull remaining static globals into _PyRuntimeState and PyInterpreterState * minor updates to PEP 554 * finish up the last couple pieces of the PEP 554 implementation * maybe publish a companion PEP about per-interpreter GIL
There are also a few decisions to be made. I'll open a couple of other threads to get feedback on those. Here I'd like your thoughts on the following:
Do we need a PEP about per-interpreter GIL?
I haven't thought there would be much value in such a PEP. There doesn't seem to be any decision that needs to be made. At best the PEP would be an explanation of the project, where:
* the objective has gotten a lot of support (and we're working on addressing the concerns of the few objectors) * most of the required work is worth doing regardless (e.g. improve runtime init/fini, eliminate static globals) * the performance impact is likely to be a net improvement * it is fully backward compatible and the C-API is essentially unaffected
So the value of a PEP would be in consolidating an explanation of the project into a single document. It seems like a poor fit for a PEP.
(You might wonder, "what about PEP 554?" I purposefully avoided any discussion of the GIL in PEP 554. It's purpose is to expose subinterpreters to Python code.)
However, perhaps I'm too close to it all. I'd like your thoughts on the matter.
Thanks!
-eric _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PNLBJBNI... Code of Conduct: http://python.org/psf/codeofconduct/
-- Nathaniel J. Smith -- https://vorpus.org
-- Nathaniel J. Smith -- https://vorpus.org
+1 for consolidated documentation about per-interpreter GIL. On Tue, Dec 14, 2021 at 9:12 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
Hi all,
I'm still hoping to land a per-interpreter GIL for 3.11. There is still a decent amount of work to be done but little of it will require solving any big problems:
* pull remaining static globals into _PyRuntimeState and PyInterpreterState * minor updates to PEP 554 * finish up the last couple pieces of the PEP 554 implementation * maybe publish a companion PEP about per-interpreter GIL
There are also a few decisions to be made. I'll open a couple of other threads to get feedback on those. Here I'd like your thoughts on the following:
Do we need a PEP about per-interpreter GIL?
I haven't thought there would be much value in such a PEP. There doesn't seem to be any decision that needs to be made. At best the PEP would be an explanation of the project, where:
Even if there's no decision to be made, I think an informational PEP would be valuable. Maybe even a Standards Track PEP? (since it is technically a new feature)
* the objective has gotten a lot of support (and we're working on addressing the concerns of the few objectors)
There's value in documenting the concerns and how they are being addressed, and a PEP sounds like a good place to capture that.
* most of the required work is worth doing regardless (e.g. improve runtime init/fini, eliminate static globals) * the performance impact is likely to be a net improvement
Also worth documenting that in the PEP (once there are benchmarks results).
* it is fully backward compatible and the C-API is essentially unaffected
Since this is a likely concern, a PEP is a good place to address it.
So the value of a PEP would be in consolidating an explanation of the project into a single document. It seems like a poor fit for a PEP.
There is value in consolidating the project rationale, details, objections, etc. Is it a poor fit for a PEP? I don't know - is there a better alternative? I guess it could be covered in the docs or devguide instead, but I don't see a philosophical issue with using a PEP for this.
(You might wonder, "what about PEP 554?" I purposefully avoided any discussion of the GIL in PEP 554. It's purpose is to expose subinterpreters to Python code.)
However, perhaps I'm too close to it all. I'd like your thoughts on the matter.
Thanks!
-eric _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PNLBJBNI... Code of Conduct: http://python.org/psf/codeofconduct/
Hi, On Tue, Dec 14, 2021 at 6:13 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
I'm still hoping to land a per-interpreter GIL for 3.11. There is still a decent amount of work to be done but little of it will require solving any big problems:
* pull remaining static globals into _PyRuntimeState and PyInterpreterState
I'm tracking remaining issues to get per-interpreter GIL at two places: * https://bugs.python.org/issue40512 * https://pythondev.readthedocs.io/subinterpreters.html I also wrote a recent summary of what has been done and what remains to be done: https://vstinner.github.io/isolate-subinterpreters.html Extract: """ There are still multiple interesting technical challenges: * bpo-39511: Per-interpreter singletons (None, True, False, etc.) * bpo-40601: Hide static types from the C API * Make pymalloc allocator compatible with subinterpreters. * Make the GIL per interpreter. Maybe even give the choice to share or not the GIL when a subinterpreter is created. * Make the _PyArg_Parser (parser_init()) function compatible with subinterpreters. Maybe use a per-interpreter array, similar solution than _PyUnicode_FromId(). * bpo-15751: Make the PyGILState API compatible with subinterpreters (issue created in 2012!) * bpo-40522: Get the current Python interpreter state from Thread Local Storage (autoTSSkey) Also, there are still many static types to convert to heap types (bpo-40077) and many extension modules to convert to the multiphase initialization API (bpo-163574). """ IMO "bpo-40601: Hide static types from the C API and "bpo-40522: Get the current Python interpreter state from Thread Local Storage" are non-trivial issues. Otherwise, they would already be solved. And these are strict pre-requires to consider have one GIL per interpreter.
* the objective has gotten a lot of support (and we're working on addressing the concerns of the few objectors) * most of the required work is worth doing regardless (e.g. improve runtime init/fini, eliminate static globals) * the performance impact is likely to be a net improvement * it is fully backward compatible and the C-API is essentially unaffected
The work required by subinterpreters help to cleaning up *all* Python objects at exit which helps another use case: embed Python in an application, especially load multiple instances of Python, when Python is used as a plugin. About the C API changes: until the "per GIL interpreter" feature is fully implemented, I'm not 100% sure that no C API change is needed. Obviously, I hope that no change will be needed ;-) Also some changes needed by interpreters introduces a small slowdown. I tried to measure it each time I noticed a potential slowdown. The largest slowdown was the _PyUnicode_FromId() change was added 1 nanosecond per function call if I recall correctly. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
Hi Eric, Did you try to take into account the envisioned project for adding a "complete" GC and removing the GIL? Regards Antoine. On Tue, 14 Dec 2021 10:12:07 -0700 Eric Snow <ericsnowcurrently@gmail.com> wrote:
Hi all,
I'm still hoping to land a per-interpreter GIL for 3.11. There is still a decent amount of work to be done but little of it will require solving any big problems:
* pull remaining static globals into _PyRuntimeState and PyInterpreterState * minor updates to PEP 554 * finish up the last couple pieces of the PEP 554 implementation * maybe publish a companion PEP about per-interpreter GIL
There are also a few decisions to be made. I'll open a couple of other threads to get feedback on those. Here I'd like your thoughts on the following:
Do we need a PEP about per-interpreter GIL?
I haven't thought there would be much value in such a PEP. There doesn't seem to be any decision that needs to be made. At best the PEP would be an explanation of the project, where:
* the objective has gotten a lot of support (and we're working on addressing the concerns of the few objectors) * most of the required work is worth doing regardless (e.g. improve runtime init/fini, eliminate static globals) * the performance impact is likely to be a net improvement * it is fully backward compatible and the C-API is essentially unaffected
So the value of a PEP would be in consolidating an explanation of the project into a single document. It seems like a poor fit for a PEP.
(You might wonder, "what about PEP 554?" I purposefully avoided any discussion of the GIL in PEP 554. It's purpose is to expose subinterpreters to Python code.)
However, perhaps I'm too close to it all. I'd like your thoughts on the matter.
Thanks!
-eric
On Wed, 15 Dec 2021 14:13:03 +0100 Antoine Pitrou <antoine@python.org> wrote:
Hi Eric,
Did you try to take into account the envisioned project for adding a "complete" GC and removing the GIL?
Sorry, I was misremembering the details. Sam Gross' proposal (posted here on 07/10/2021) doesn't switch to a "complete GC", but it changes reference counting to a more sophisticated scheme (which includes immortalization of objects): https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd... Regards Antoine.
Regards
Antoine.
On Tue, 14 Dec 2021 10:12:07 -0700 Eric Snow <ericsnowcurrently@gmail.com> wrote:
Hi all,
I'm still hoping to land a per-interpreter GIL for 3.11. There is still a decent amount of work to be done but little of it will require solving any big problems:
* pull remaining static globals into _PyRuntimeState and PyInterpreterState * minor updates to PEP 554 * finish up the last couple pieces of the PEP 554 implementation * maybe publish a companion PEP about per-interpreter GIL
There are also a few decisions to be made. I'll open a couple of other threads to get feedback on those. Here I'd like your thoughts on the following:
Do we need a PEP about per-interpreter GIL?
I haven't thought there would be much value in such a PEP. There doesn't seem to be any decision that needs to be made. At best the PEP would be an explanation of the project, where:
* the objective has gotten a lot of support (and we're working on addressing the concerns of the few objectors) * most of the required work is worth doing regardless (e.g. improve runtime init/fini, eliminate static globals) * the performance impact is likely to be a net improvement * it is fully backward compatible and the C-API is essentially unaffected
So the value of a PEP would be in consolidating an explanation of the project into a single document. It seems like a poor fit for a PEP.
(You might wonder, "what about PEP 554?" I purposefully avoided any discussion of the GIL in PEP 554. It's purpose is to expose subinterpreters to Python code.)
However, perhaps I'm too close to it all. I'd like your thoughts on the matter.
Thanks!
-eric
On Wed, Dec 15, 2021 at 6:04 AM Antoine Pitrou <antoine@python.org> wrote:
On Wed, 15 Dec 2021 14:13:03 +0100 Antoine Pitrou <antoine@python.org> wrote:
Did you try to take into account the envisioned project for adding a "complete" GC and removing the GIL?
Sorry, I was misremembering the details. Sam Gross' proposal (posted here on 07/10/2021) doesn't switch to a "complete GC", but it changes reference counting to a more sophisticated scheme (which includes immortalization of objects):
https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd...
A note about this: Sam's immortalization covers exactly the objects that Eric is planning to move into the interpreter state struct: "such as interned strings, small integers, statically allocated PyTypeObjects, and the True, False, and None objects". (Well, he says "such as" but I think so does Eric. :-) Sam's approach is to use the lower bit of the ob_refcnt field to indicate immortal objects. This would not work given the stable ABI (which has macros that directly increment and decrement the ob_refcnt field). In fact, I think that Sam's work doesn't preserve the stable ABI at all. However, setting a very high bit (the bit just below the sign bit) would probably work. Say we're using 32 bits. We use the value 0x_6000_0000 as the initial refcount for immortal objects. The stable ABI will sometimes increment this, sometimes decrement it. But as long as the imbalance is less than 0x_2000_0000, the refcount will remain in the inclusive range [ 0x_4000_0000 , 0x_7FFF_FFFF ] and we can test for immortality by testing a single bit: if (o->ob_refcnt & 0x_4000_0000) I don't know how long that would take, but I suspect that a program that just increments the refcount relentlessly would have to run for hours before hitting this range. On a 64-bit machine the same approach would require years to run before a refcount would exceed the maximum allowable imbalance. (These estimates are from Mark Shannon.) Another potential issue is that there may be some applications that take refcounts at face value (perhaps obtained using sys.getrefcount()). These would find that immortal objects have a very large refcount, which might surprise them. But technically a very large refcount is totally valid, and the kinds of objects that we plan to immortalize are all widely shared -- who cares if the refcount for None is 5000 or 1610612736? As long as the refcount of *mortal* objects is the same as it was before, this shouldn't be a problem. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Wed, Dec 15, 2021 at 3:00 PM Guido van Rossum <guido@python.org> wrote:
who cares if the refcount for None is 5000 or 1610612736? As long as the refcount of *mortal* objects is the same as it was before, this shouldn't be a problem.
indeed: $ python -c "import sys; print(sys.getrefcount(None))" 4110 and a newly started iPython session: In [2]: sys.getrefcount(None) Out[2]: 28491 It does seem a bit silly to actually be tracking that refcount :-) -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
It does seem a bit silly to actually be tracking that refcount :-)
Not that silly. It can easily help in C extensions to detect wrong DECREF calls:
import ctypes non = ctypes.c_long.from_address(id(None)) non.value = 10
Fatal Python error: none_dealloc: deallocating None Python runtime state: finalizing (tstate=0x000055c66cf263f0)
Current thread 0x00007f4afa383740 (most recent call first): <no Python frame> [1] 635685 abort (core dumped) python On Thu, 16 Dec 2021 at 00:31, Christopher Barker <pythonchb@gmail.com> wrote:
On Wed, Dec 15, 2021 at 3:00 PM Guido van Rossum <guido@python.org> wrote:
who cares if the refcount for None is 5000 or 1610612736? As long as the refcount of *mortal* objects is the same as it was before, this shouldn't be a problem.
indeed:
$ python -c "import sys; print(sys.getrefcount(None))" 4110
and a newly started iPython session:
In [2]: sys.getrefcount(None) Out[2]: 28491
It does seem a bit silly to actually be tracking that refcount :-)
-CHB
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/36N4I4CJ... Code of Conduct: http://python.org/psf/codeofconduct/
On 2021-12-15 2:57 p.m., Guido van Rossum wrote:
But as long as the imbalance is less than 0x_2000_0000, the refcount will remain in the inclusive range [ 0x_4000_0000 , 0x_7FFF_FFFF ] and we can test for immortality by testing a single bit:
if (o->ob_refcnt & 0x_4000_0000)
Could we have a full GC pass reset those counts to make it even more unlikely to get out of bounds? Allocating immortal objects from a specific memory region seems like another idea worth pursuing. It seems mimalloc has the ability to allocate pools aligned to certain large boundaries. That takes some platform specific magic. If we can do that, the test for immortality is pretty cheap. However, if you can't allocate them at a fixed region determined at compile time, I don't think you can match the performance of the code above. Maybe it helps that you could determine immortality by looking at the PyObject pointer and without loading the ob_refcnt value from memory? You would do something like: if (((uintptr_t)o) & _Py_immortal_mask) The _Py_immortal_mask value would not be known at compile time but would be a global constant. So, it would be cached by the CPU.
(I just realized that we started discussing details of immortal objects in the wrong thread -- this is Eric's overview thread, there's a separate thread on immortal objects. But alla, I'll respond here below.) On Wed, Dec 15, 2021 at 5:05 PM Neil Schemenauer <neil@python.ca> wrote:
On 2021-12-15 2:57 p.m., Guido van Rossum wrote:
But as long as the imbalance is less than 0x_2000_0000, the refcount will remain in the inclusive range [ 0x_4000_0000 , 0x_7FFF_FFFF ] and we can test for immortality by testing a single bit:
if (o->ob_refcnt & 0x_4000_0000)
Could we have a full GC pass reset those counts to make it even more unlikely to get out of bounds?
Maybe, but so far these are all immutable singletons that aren't linked into the GC at all. Of course we could just add extra code to the GC code that just resets all these refcounts, but since there are ~260 small integers that might slow things down more than we'd like. More testing is required. Maybe we can get away with doing nothing on 64-bit machines but we'll have to slow down a tad for 32-bit -- that would be acceptable (since the future is clearly 64-bit).
Allocating immortal objects from a specific memory region seems like another idea worth pursuing. It seems mimalloc has the ability to allocate pools aligned to certain large boundaries. That takes some platform specific magic. If we can do that, the test for immortality is pretty cheap. However, if you can't allocate them at a fixed region determined at compile time, I don't think you can match the performance of the code above. Maybe it helps that you could determine immortality by looking at the PyObject pointer and without loading the ob_refcnt value from memory? You would do something like:
if (((uintptr_t)o) & _Py_immortal_mask)
The _Py_immortal_mask value would not be known at compile time but would be a global constant. So, it would be cached by the CPU.
Very clever. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On 16. 12. 21 2:54, Guido van Rossum wrote:
(I just realized that we started discussing details of immortal objects in the wrong thread -- this is Eric's overview thread, there's a separate thread on immortal objects. But alla, I'll respond here below.)
On Wed, Dec 15, 2021 at 5:05 PM Neil Schemenauer <neil@python.ca <mailto:neil@python.ca>> wrote:
On 2021-12-15 2:57 p.m., Guido van Rossum wrote:
But as long as the imbalance is less than 0x_2000_0000, the refcount will remain in the inclusive range [ 0x_4000_0000 , 0x_7FFF_FFFF ] and we can test for immortality by testing a single bit:
if (o->ob_refcnt & 0x_4000_0000)
Could we have a full GC pass reset those counts to make it even more unlikely to get out of bounds?
Maybe, but so far these are all immutable singletons that aren't linked into the GC at all. Of course we could just add extra code to the GC code that just resets all these refcounts, but since there are ~260 small integers that might slow things down more than we'd like. More testing is required. Maybe we can get away with doing nothing on 64-bit machines but we'll have to slow down a tad for 32-bit -- that would be acceptable (since the future is clearly 64-bit).
Allocating immortal objects from a specific memory region seems like another idea worth pursuing. It seems mimalloc has the ability to allocate pools aligned to certain large boundaries. That takes some platform specific magic. If we can do that, the test for immortality is pretty cheap. However, if you can't allocate them at a fixed region determined at compile time, I don't think you can match the performance of the code above. Maybe it helps that you could determine immortality by looking at the PyObject pointer and without loading the ob_refcnt value from memory? You would do something like:
if (((uintptr_t)o) & _Py_immortal_mask)
The _Py_immortal_mask value would not be known at compile time but would be a global constant. So, it would be cached by the CPU.
Inmmortal objects should be allocated dynamically. AFAIK, determining whether something was malloc'd or not would need to be platform-specific.
On Wed, Dec 15, 2021 at 2:57 PM Guido van Rossum <guido@python.org> wrote:
I don't know how long that would take, but I suspect that a program that just increments the refcount relentlessly would have to run for hours before hitting this range. On a 64-bit machine the same approach would require years to run before a refcount would exceed the maximum allowable imbalance. (These estimates are from Mark Shannon.)
Hm, not quite. I modified a fast builtin to incref its argument, and then I called it in a `while True` loop, interrupted, and timed it. This did ~24,000,000 INCREFs/second. This would hit 0x_2000_0000 in about 9 minutes. And I wasn't even trying that hard -- I could have written the loop in C. (I did comment out an audit call though. :-) The same loop on 64-bit would take 1700 years to reach the limit, so we're safe there. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Wed, Dec 15, 2021 at 6:21 PM Guido van Rossum <guido@python.org> wrote:
On Wed, Dec 15, 2021 at 2:57 PM Guido van Rossum <guido@python.org> wrote:
I don't know how long that would take, but I suspect that a program that just increments the refcount relentlessly would have to run for hours before hitting this range. On a 64-bit machine the same approach would require years to run before a refcount would exceed the maximum allowable imbalance. (These estimates are from Mark Shannon.)
Hm, not quite. I modified a fast builtin to incref its argument, and then I called it in a `while True` loop, interrupted, and timed it. This did ~24,000,000 INCREFs/second. This would hit 0x_2000_0000 in about 9 minutes. And I wasn't even trying that hard -- I could have written the loop in C. (I did comment out an audit call though. :-) The same loop on 64-bit would take 1700 years to reach the limit, so we're safe there.
Similar 32-bit vs 64-bit overflow estimates were made by Victor Stinner in the dict version tag PEP 509 https://www.python.org/dev/peps/pep-0509/#integer-overflow tldr 4sec on 32-bit and 584 years on 64-bit Granted, the risk there is only for *exactly* `2 ** (Nbits)` increments.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...> _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2PQVABDB... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Dec 15, 2021 at 02:57:46PM -0800, Guido van Rossum wrote:
Another potential issue is that there may be some applications that take refcounts at face value (perhaps obtained using sys.getrefcount()). These would find that immortal objects have a very large refcount, which might surprise them. But technically a very large refcount is totally valid, and the kinds of objects that we plan to immortalize are all widely shared -- who cares if the refcount for None is 5000 or 1610612736? As long as the refcount of *mortal* objects is the same as it was before, this shouldn't be a problem.
I agree with your reasoning. But can we agree to document the presence and interpretation of the magic bit, so that if anyone actually does care (for whatever reason, good bad or indifferent) they can mask off the immortal bit to get the real ref num? Or maybe even have getrefcount() automatically mask the bit off. If we reserve the bit as the immortal bit, then is there any reason to keep that bit visible when returning refcounts? -- Steve
On Thu, 16 Dec 2021 14:32:05 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Dec 15, 2021 at 02:57:46PM -0800, Guido van Rossum wrote:
Another potential issue is that there may be some applications that take refcounts at face value (perhaps obtained using sys.getrefcount()). These would find that immortal objects have a very large refcount, which might surprise them. But technically a very large refcount is totally valid, and the kinds of objects that we plan to immortalize are all widely shared -- who cares if the refcount for None is 5000 or 1610612736? As long as the refcount of *mortal* objects is the same as it was before, this shouldn't be a problem.
I agree with your reasoning. But can we agree to document the presence and interpretation of the magic bit, so that if anyone actually does care (for whatever reason, good bad or indifferent) they can mask off the immortal bit to get the real ref num?
The "real number of references" would not be known for immortal objects. Regards Antoine.
On Thu, 16 Dec 2021 23:32:17 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Dec 16, 2021 at 12:23:09PM +0100, Antoine Pitrou wrote:
The "real number of references" would not be known for immortal objects.
Oh that surprises me. How does that work? Does that imply that some code might not increment the ref count, while other code will?
If an object is immortal, then its refcount wouldn't change at all. « Some objects, such as interned strings, small integers, statically allocated PyTypeObjects, and the True, False, and None objects stay alive for the lifetime of the program. These objects are marked as immortal by setting the least-significant bit of the local reference count field (bit 0). The Py_INCREF and Py_DECREF macros are no-ops for these objects. This avoids contention on the reference count fields of these objects when they are accessed concurrently by multiple threads. » Regards Antoine.
On 15. 12. 21 23:57, Guido van Rossum wrote:
On Wed, Dec 15, 2021 at 6:04 AM Antoine Pitrou <antoine@python.org <mailto:antoine@python.org>> wrote:
On Wed, 15 Dec 2021 14:13:03 +0100 Antoine Pitrou <antoine@python.org <mailto:antoine@python.org>> wrote:
> Did you try to take into account the envisioned project for adding a > "complete" GC and removing the GIL?
Sorry, I was misremembering the details. Sam Gross' proposal (posted here on 07/10/2021) doesn't switch to a "complete GC", but it changes reference counting to a more sophisticated scheme (which includes immortalization of objects):
https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd... <https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd...>
A note about this: Sam's immortalization covers exactly the objects that Eric is planning to move into the interpreter state struct: "such as interned strings, small integers, statically allocated PyTypeObjects, and the True, False, and None objects". (Well, he says "such as" but I think so does Eric. :-)
Sam's approach is to use the lower bit of the ob_refcnt field to indicate immortal objects. This would not work given the stable ABI (which has macros that directly increment and decrement the ob_refcnt field). In fact, I think that Sam's work doesn't preserve the stable ABI at all. However, setting a very high bit (the bit just below the sign bit) would probably work. Say we're using 32 bits. We use the value 0x_6000_0000 as the initial refcount for immortal objects. The stable ABI will sometimes increment this, sometimes decrement it. But as long as the imbalance is less than 0x_2000_0000, the refcount will remain in the inclusive range [ 0x_4000_0000 , 0x_7FFF_FFFF ] and we can test for immortality by testing a single bit:
if (o->ob_refcnt & 0x_4000_0000)
I don't know how long that would take, but I suspect that a program that just increments the refcount relentlessly would have to run for hours before hitting this range. On a 64-bit machine the same approach would require years to run before a refcount would exceed the maximum allowable imbalance. (These estimates are from Mark Shannon.)
But does the sign bit need to stay intact, and do we actually need to rely on the immortal bit to always be set for immortal objects? If the refcount rolls over to zero, an immortal object's dealloc could bump it back and give itself another few minutes. Allowing such rollover would mean having to deal with negative refcounts, but that might be acceptable.
Another potential issue is that there may be some applications that take refcounts at face value (perhaps obtained using sys.getrefcount()). These would find that immortal objects have a very large refcount, which might surprise them. But technically a very large refcount is totally valid, and the kinds of objects that we plan to immortalize are all widely shared -- who cares if the refcount for None is 5000 or 1610612736? As long as the refcount of *mortal* objects is the same as it was before, this shouldn't be a problem.
A very small refcount would be even more surprising, but the same logic applies: who cares if the refcount for None is 5000 or -5000?
On Thu, Dec 16, 2021 at 2:48 AM Petr Viktorin <encukou@gmail.com> wrote:
But does the sign bit need to stay intact, and do we actually need to rely on the immortal bit to always be set for immortal objects? If the refcount rolls over to zero, an immortal object's dealloc could bump it back and give itself another few minutes. Allowing such rollover would mean having to deal with negative refcounts, but that might be acceptable.
FWIW, my original attempt at immortal objects (quite a while ago) used the sign bit as the marker (negative refcount meant immortal). However, this broke GC and Py_DECREF() and getting those to work right was a pain. It also made a few things harder to debug because a negative refcount no longer necessarily indicated something had gone wrong. In the end I switched to a really high bit as the marker and it was all much simpler. -eric
On Thu, Dec 16, 2021 at 12:00 AM Guido van Rossum <guido@python.org> wrote:
Sam's approach is to use the lower bit of the ob_refcnt field to indicate immortal objects. This would not work given the stable ABI (which has macros that directly increment and decrement the ob_refcnt field). (...) we can test for immortality by testing a single bit:
if (o->ob_refcnt & 0x_4000_0000)
If PyQt5 or pycryptography is built with the Python 3.9 limited C API, it will use the old Py_INCREF/Py_DECREF which doesn't have special code for immortal objects. I understand that if a C extension built for an old stable ABI decrements the refcount below the limit, the object becomes mortal (can be deleted), no? I'm thinking about the case when a C extension is used in subinterpreters run in parallel (one GIL per interpreter) without any kind of locking around Py_INCREF/Py_DECREF: "data races" made "on purpose" (to not make Py_INCREF/Py_DECREF slower for the general case). Or you can think about a similar scenario with the "nogil" project. For now, I suggest to consider "subinterpreters running in parallel" and nogil use cases as special and require to build C extensions with a special option, since there are other C API changes which are incompatible with the stabl ABI anyway. * Subinterpreters running in parallel are not compatible with static type: it requires to change the ABI * nogil changes PyObject structure, Py_INCREF and Py_DECREF: it requires to change the ABI Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Wed, Dec 15, 2021 at 02:57:46PM -0800, Guido van Rossum wrote:
On Wed, Dec 15, 2021 at 6:04 AM Antoine Pitrou <antoine@python.org> wrote:
On Wed, 15 Dec 2021 14:13:03 +0100 Antoine Pitrou <antoine@python.org> wrote:
Did you try to take into account the envisioned project for adding a "complete" GC and removing the GIL?
Sorry, I was misremembering the details. Sam Gross' proposal (posted here on 07/10/2021) doesn't switch to a "complete GC", but it changes reference counting to a more sophisticated scheme (which includes immortalization of objects):
https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd...
A note about this: Sam's immortalization covers exactly the objects that Eric is planning to move into the interpreter state struct: "such as interned strings, small integers, statically allocated PyTypeObjects, and the True, False, and None objects". (Well, he says "such as" but I think so does Eric. :-)
Sam's approach is to use the lower bit of the ob_refcnt field to indicate immortal objects. This would not work given the stable ABI (which has macros that directly increment and decrement the ob_refcnt field). In fact, I think that Sam's work doesn't preserve the stable ABI at all. However, setting a very high bit (the bit just below the sign bit) would probably work. Say we're using 32 bits. We use the value 0x_6000_0000 as the initial refcount for immortal objects. The stable ABI will sometimes increment this, sometimes decrement it. But as long as the imbalance is less than 0x_2000_0000, the refcount will remain in the inclusive range [ 0x_4000_0000 , 0x_7FFF_FFFF ] and we can test for immortality by testing a single bit:
if (o->ob_refcnt & 0x_4000_0000)
I don't know how long that would take, but I suspect that a program that just increments the refcount relentlessly would have to run for hours before hitting this range. On a 64-bit machine the same approach would require years to run before a refcount would exceed the maximum allowable imbalance. (These estimates are from Mark Shannon.)
I did some research on this a few years back. I was curious what sort of "max reference counts" were encountered in the wild, in long-running real life programs. For the same reason: I wanted to get some insight into how many unused bits could possibly be repurposed for future shenanigans (I had PyParallel* in the mind at the time). I added some logic to capture* the max reference counts of the None, True, and Zero objects (in a trace callback), then ran a really long simulation program of a client's (it ran for about 5-6 hours). The results were as follows: MaxNoneRefCount 9,364,132 MaxTrueRefCount 204,215 MaxZeroRefCount 36,784 I thought that was pretty interesting. Potentially many, many upper bits for the taking. The code also had some logic that would int 3 as soon as a 32-bit refcnt overflowed, and that never hit either (obviously, based on the numbers above). I also failed to come up with real-life code that would result in a Python object having a reference count higher than None's refcnt, but that may have just been from lack of creativity. Just thought I'd share. Regards, Trent. [*] 1: https://github.com/pyparallel/pyparallel [*] 2: https://github.com/tpn/tracer/blob/master/PythonTracer/PythonTracer.h#L690
On Wed, Jan 5, 2022, 15:02 Trent Nelson <trent@trent.me> wrote:
I thought that was pretty interesting. Potentially many, many upper bits for the taking. The code also had some logic that would int 3 as soon as a 32-bit refcnt overflowed, and that never hit either (obviously, based on the numbers above).
I also failed to come up with real-life code that would result in a Python object having a reference count higher than None's refcnt, but that may have just been from lack of creativity.
Just thought I'd share.
Thanks, Trent. That's super helpful. -eric
On Wed, Jan 05, 2022 at 01:59:21PM -0800, Trent Nelson wrote:
I did some research on this a few years back. I was curious what sort of "max reference counts" were encountered in the wild, in long-running real life programs. For the same reason: I wanted to get some insight into how many unused bits could possibly be repurposed for future shenanigans (I had PyParallel* in the mind at the time).
I added some logic to capture* the max reference counts of the None, True, and Zero objects (in a trace callback), then ran a really long simulation program of a client's (it ran for about 5-6 hours). The results were as follows:
MaxNoneRefCount 9,364,132 MaxTrueRefCount 204,215 MaxZeroRefCount 36,784
Just double-checked my results, there were a handful of runs with higher counts: MaxNoneRefCount 59,834,444 MaxTrueRefCount 1,072,467 MaxZeroRefCount 3,460,921 Regards, Trent.
On Thu, Jan 6, 2022 at 7:00 AM Trent Nelson <trent@trent.me> wrote:
I did some research on this a few years back. I was curious what sort of "max reference counts" were encountered in the wild, in long-running real life programs. For the same reason: I wanted to get some insight into how many unused bits could possibly be repurposed for future shenanigans (I had PyParallel* in the mind at the time).
I think we can assume the upper bound of the reference count is same to upper bound of the pointer. On 32bit machine, memory space is 2**32 byte, and pointers take 4bytes. And NULL can not store pointer. So upper bound of refcnt is 2**30-1. So we have two free bits in the refcnt. On 64bit machine, we have at least four free bits as same reason. Regards, -- Inada Naoki <songofacandy@gmail.com>
On Wed, Dec 15, 2021 at 6:16 AM Antoine Pitrou <antoine@python.org> wrote:
Did you try to take into account the envisioned project for adding a "complete" GC and removing the GIL?
Yeah. I was going to start a separate thread about per-interpreter GIL vs. no-gil, but figured I was already pushing my luck with 3 simultaneous related threads here. :) It would definitely be covered by the info doc/PEP. -eric
On Tue, Dec 14, 2021 at 10:12 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
* it is fully backward compatible and the C-API is essentially unaffected
Hmm, this is a little misleading. It will definitely be backward incompatible for extension modules that don't work under multiple subinterpreters (or rely on the GIL to protect global state). Hence that other thread I started. :) -eric
On Wed, 15 Dec 2021, 3:18 am Eric Snow, <ericsnowcurrently@gmail.com> wrote:
Hi all,
I'm still hoping to land a per-interpreter GIL for 3.11. There is still a decent amount of work to be done but little of it will require solving any big problems:
* pull remaining static globals into _PyRuntimeState and PyInterpreterState * minor updates to PEP 554 * finish up the last couple pieces of the PEP 554 implementation * maybe publish a companion PEP about per-interpreter GIL
There are also a few decisions to be made. I'll open a couple of other threads to get feedback on those. Here I'd like your thoughts on the following:
Do we need a PEP about per-interpreter GIL?
I haven't thought there would be much value in such a PEP. There doesn't seem to be any decision that needs to be made. At best the PEP would be an explanation of the project, where:
* the objective has gotten a lot of support (and we're working on addressing the concerns of the few objectors) * most of the required work is worth doing regardless (e.g. improve runtime init/fini, eliminate static globals) * the performance impact is likely to be a net improvement * it is fully backward compatible and the C-API is essentially unaffected
So the value of a PEP would be in consolidating an explanation of the project into a single document. It seems like a poor fit for a PEP.
I think PEP 630 (Petr's summary of the improvements to extension module reloading) is a good example of such a PEP being valuable. Writing such a PEP also provides a place to summarise key design decisions and known limitations (e.g. do the threading module primitives work for cross-interpreter synchronisation? If not, what should be used instead? multiprocessing? Something new that is still to be defined?). Cheers, Nick.
(
participants (14)
-
Antoine Pitrou
-
Christopher Barker
-
Eric Snow
-
Guido van Rossum
-
Inada Naoki
-
Itamar O
-
Nathaniel Smith
-
Neil Schemenauer
-
Nick Coghlan
-
Pablo Galindo Salgado
-
Petr Viktorin
-
Steven D'Aprano
-
Trent Nelson
-
Victor Stinner