
On Wed, Feb 23, 2022 at 10:12 AM Eric Snow <ericsnowcurrently@gmail.com> wrote:
Thanks for the feedback. I've responded inline below.
-eric
On Sat, Feb 19, 2022 at 8:50 PM Inada Naoki <songofacandy@gmail.com> wrote:
I hope per-interpreter GIL success at some point, and I know this is needed for per-interpreter GIL.
But I am worrying about per-interpreter GIL may be too complex to implement and maintain for core developers and extension writers. As you know, immortal don't mean sharable between interpreters. It is too difficult to know which object can be shared, and where the shareable objects are leaked to other interpreters. So I am not sure that per interpreter GIL is achievable goal.
I plan on addressing this in the PEP I am working on for per-interpreter GIL. In the meantime, I doubt the issue will impact any core devs.
It's nice to hear!
So I think it's too early to introduce the immortal objects in Python 3.11, unless it *improve* performance without per-interpreter GIL Instead, we can add a configuration option such as `--enalbe-experimental-immortal`.
I agree that immortal objects aren't quite as appealing in general without per-interpreter GIL. However, there are actual users that will benefit from it, assuming we can reduce the performance penalty to acceptable levels. For a recent example, see https://mail.python.org/archives/list/python-dev@python.org/message/B77BQQFD....
It is not proven example, but just a hope at the moment. So option is fine to prove the idea. Although I can not read the code, they said "patching ASLR by patching `ob_type` fields;". It will cause CoW for most objects, isn't it? So reducing memory write don't directly means reducing CoW. Unless we can stop writing on a page completely, the page will be copied.
On Sat, Feb 19, 2022 at 4:52 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
Reducing CPU Cache Invalidation -------------------------------
Avoiding Data Races -------------------
Both benefits require a per-interpreter GIL.
CPU cache invalidation exists regardless. With the current GIL the effect it is reduced significantly.
It's an interesting point. We can not see the benefit from pypeformance, because it doesn't use much data and it runs one process at a time. So the pyperformance can not make enough stress to the last level cache which is shared by many cores. We need multiprocess performance benchmark apart from pyperformance, to stress the last level cache from multiple cores. It helps not only this PEP, but also optimizing containers like dict and set.
As I wrote before, fork is very difficult to use safely. We can not recommend to use it for many users. And I don't think reducing the size of patch in Instagram or YouTube is not good rational for this kind of change.
What do you mean by "this kind of change"? The proposed change is relatively small. It certainly isn't nearly as intrusive as many changes we make to internals without a PEP. If you are talking about the performance penalty, we should be able to eliminate it.
Can proposed optimizations to eliminate the penalty guarantee that every __del__, weakref are not broken, and no memory leak occurs when the Python interpreter is initialized and finalized multiple times? I haven't confirmed it yet.
Also note that "fork" isn't the only operating system mechanism that uses copy-on-write semantics. Anything that uses ``mmap`` relies on copy-on-write, including sharing data from shared objects files between processes.
It is very difficult to reduce CoW with mmap(MAP_PRIVATE).
You may need to write hash of bytes and unicode. You may be need to write `tp_type`. Immortal objects can "reduce" the memory write. But "at least one memory write" is enough to trigger the CoW.
Correct. However, without immortal objects (AKA immutable per-object runtime-state) it goes from "very difficult" to "basically impossible".
Configuration option won't make it impossible.
Constraints -----------
* ensure that otherwise immutable objects can be truly immutable * be careful when immortalizing objects that are not otherwise immutable
I am not sure about what this means. For example, unicode objects are not immutable because they have hash, utf8 cache and wchar_t cache. (wchar_t cache will be removed in Python 3.12).
I think you understood it correctly. In the case of str objects, they are close enough since a race on any of those values will not cause a different outcome.
I will clarify the point in the PEP.
FWIW, I filed an issue to remove hash cache from bytes objects. https://github.com/faster-cpython/ideas/issues/290 Code objects have many bytes objects, (e.g. co_code, co_linetable, etc...) Removing it will save some RAM usage and make immortal bytes truly immutable, safe to be shared between interpreters. -- Inada Naoki <songofacandy@gmail.com>