Python: Different native runtime state tiers.

NOTE: This is copied straight from Python-Ideas mailing list.
CPython at the very least has 2 different type of native states: Interpreter & Module state. Unfortunately, the multi-phase initialization has a weakness when it comes to Module states.
You can't access the module state without a pointer to the module. PyState_GetModule from a standpoint looks to be the obvious answer to use, but it's documentation states it's unfit for multi-phase initialization.
I'm proposing an idea here for discussion on a new state system for at least CPython.
Tier 1: Core state
- This state lives within CPython's core binary and exists the entire
lifetime of the binary. - The data held within this state is available to the main interpreter and subsequent sub-interpreters. (Example: sys.executable) Tier 2-1: Interpreter state (Branches from Core state)
- This state lives within CPython's core binary and is tied to a specific
interpreter. - The data held within this state is only available to the interpreter it's tied to. (Example: Modules loaded into memory) Tier 2-2: Extension state (Branches from Core state)
- This state lives within any CPython's core binary EXCEPT it's size and
structure is defined by the extension CPython has loaded. - The purpose of this state is to allow an extension to hold data that can't be tied to a specific module. (Examples can be: Windows WSA, MySQL) Tier 3-1: Thread state (Branches from Interpreter State)
- This state lives within CPython's core binary and is tied to a specific
Python thread (IE: Threading library threads), - The data held within this state is only available to the thread it's tied to. (No known examples available) Tier 3-2: Module state (Branches from Interpreter State)
- This type of state is already available in CPython, explaining it is not
required.

Since Petr Viktorin didn't add a formal reply to this, I'm going to do so instead. They emailed me some questions which this reply will have, along with my answers for them.
Q) - What do you mean by "core binary"? - If Python is embedded into another application, what happens to the state? - If the "main" interpreter is stopped and re-initialized (Py_FinalizeEx() Py_InitializeEx()), is the state recreated? A) - The term "core binary" refers to the binary file containing the compiled result of "pythoncore". - Nothing changes other than which binary is used to allocate the object (C global variable) - Generally speaking, Py_InitializeEx()/Py_FinalizeEx() denounces the lifetime of the core binary.
Q) So, if some Python code modifies sys.executable, the change should affect all interpreters in the process? A) No, Core State can not hold any Python heap allocated objects, also, virtually all Python first party types COPIES data into newly allocated memory.
Q) - What does "Branches from Core State" mean? - How will this state be available to the interpreter it's tied to? A) - This casually means that access to the Core State can be done with just a pointer to the active Interpreter State. Note: This can be accomplished via either a constant pointer inside the Interpreter State or (Global State only) with a global C variable. - See the above answer but from the Thread State & Interpreter State branching viewpoint. Note: The active Thread State assumes we're not changing how threading works in Python, BUT it's flexible enough for thread_local storage.
Q) How is this different from C "static" storage? A) Generally speaking, it's not different at all. This is the only State definition that can be dropped all together if widely unpopular. Note: But on the flip side, we will still require an initialization/finalization for extensions to implement (see Windows WSA/MySQL documentation)

On 2020-07-16 20:40, William Pickard wrote:
Since Petr Viktorin didn't add a formal reply to this, I'm going to do so instead. They emailed me some questions which this reply will have, along with my answers for them.
Oops! I did reply privately; sorry for not including the list.
I've since written what might be a seen as a counter-proposal here: https://mail.python.org/archives/list/capi-sig@python.org/thread/6CGIIZVMJRY...
Basically, I would like to ensure that you *can* always get a pointer to the module. In that model, "Extension state" becomes unnecessary for most uses, but it can be emulated by static storage (using reference counting and locking to allow it to be used by several module objects). I consider "thread state" out of scope; that can be done with TSS, with the Py_tss_t *key in either module state or (in IMO rare cases) the above "Extension state" emulation.
I'm afraid I'll need to ask you to be more exact. A "binary file" doesn't really have a lifetime (unless you count "from creation until deleting from the disk", which isn't helpful). Can I assume you mean the process? You can call Py_InitializeEx()/Py_FinalizeEx() multiple times in a single process. So to be clear, which one should be the lifetime of "Core state"? A. from Py_InitializeEx() to Py_FinalizeEx() B. from first Python initialization to the process shutdown
Do you have a link to that documentation? I'm not familiar enough with these to find the relevant docs myself. (To be honest I'm not sure what WSA is.)
Anyway, I would like to avoid extra initialization/finalization hooks (which seem necessary for "Extension state"), and base as much as possible on the regular PyObject* lifecycle (of the module objects). Running Python code in interpreter shutdown is hard to get right, especially if you allow references to Python objects and so you need the full GC machinery.

Petr Viktorin wrote:
- The lifetime I was referring to for a binary file is from when it's loaded into memory, to the moment it's unloaded.
- Since you stated that "Py_InitializeEx"/"Py_FinalizeEx()" can be invoked multiple times, Core State's lifetime is tied to those calls. (Option A)
- This also means Core State can not be a global static storage item, but a heap allocated object.
After reviewing things over, I realized Winsock and MySQL don't fit the bill as I thought they did, BUT, that doesn't say that we don't need extension specific initialization/finalization, I just haven't found a library that requires it.

Since Petr Viktorin didn't add a formal reply to this, I'm going to do so instead. They emailed me some questions which this reply will have, along with my answers for them.
Q) - What do you mean by "core binary"? - If Python is embedded into another application, what happens to the state? - If the "main" interpreter is stopped and re-initialized (Py_FinalizeEx() Py_InitializeEx()), is the state recreated? A) - The term "core binary" refers to the binary file containing the compiled result of "pythoncore". - Nothing changes other than which binary is used to allocate the object (C global variable) - Generally speaking, Py_InitializeEx()/Py_FinalizeEx() denounces the lifetime of the core binary.
Q) So, if some Python code modifies sys.executable, the change should affect all interpreters in the process? A) No, Core State can not hold any Python heap allocated objects, also, virtually all Python first party types COPIES data into newly allocated memory.
Q) - What does "Branches from Core State" mean? - How will this state be available to the interpreter it's tied to? A) - This casually means that access to the Core State can be done with just a pointer to the active Interpreter State. Note: This can be accomplished via either a constant pointer inside the Interpreter State or (Global State only) with a global C variable. - See the above answer but from the Thread State & Interpreter State branching viewpoint. Note: The active Thread State assumes we're not changing how threading works in Python, BUT it's flexible enough for thread_local storage.
Q) How is this different from C "static" storage? A) Generally speaking, it's not different at all. This is the only State definition that can be dropped all together if widely unpopular. Note: But on the flip side, we will still require an initialization/finalization for extensions to implement (see Windows WSA/MySQL documentation)

On 2020-07-16 20:40, William Pickard wrote:
Since Petr Viktorin didn't add a formal reply to this, I'm going to do so instead. They emailed me some questions which this reply will have, along with my answers for them.
Oops! I did reply privately; sorry for not including the list.
I've since written what might be a seen as a counter-proposal here: https://mail.python.org/archives/list/capi-sig@python.org/thread/6CGIIZVMJRY...
Basically, I would like to ensure that you *can* always get a pointer to the module. In that model, "Extension state" becomes unnecessary for most uses, but it can be emulated by static storage (using reference counting and locking to allow it to be used by several module objects). I consider "thread state" out of scope; that can be done with TSS, with the Py_tss_t *key in either module state or (in IMO rare cases) the above "Extension state" emulation.
I'm afraid I'll need to ask you to be more exact. A "binary file" doesn't really have a lifetime (unless you count "from creation until deleting from the disk", which isn't helpful). Can I assume you mean the process? You can call Py_InitializeEx()/Py_FinalizeEx() multiple times in a single process. So to be clear, which one should be the lifetime of "Core state"? A. from Py_InitializeEx() to Py_FinalizeEx() B. from first Python initialization to the process shutdown
Do you have a link to that documentation? I'm not familiar enough with these to find the relevant docs myself. (To be honest I'm not sure what WSA is.)
Anyway, I would like to avoid extra initialization/finalization hooks (which seem necessary for "Extension state"), and base as much as possible on the regular PyObject* lifecycle (of the module objects). Running Python code in interpreter shutdown is hard to get right, especially if you allow references to Python objects and so you need the full GC machinery.

Petr Viktorin wrote:
- The lifetime I was referring to for a binary file is from when it's loaded into memory, to the moment it's unloaded.
- Since you stated that "Py_InitializeEx"/"Py_FinalizeEx()" can be invoked multiple times, Core State's lifetime is tied to those calls. (Option A)
- This also means Core State can not be a global static storage item, but a heap allocated object.
After reviewing things over, I realized Winsock and MySQL don't fit the bill as I thought they did, BUT, that doesn't say that we don't need extension specific initialization/finalization, I just haven't found a library that requires it.
participants (2)
-
Petr Viktorin
-
William Pickard