Mailman 3 Python: Different native runtime state tiers. - capi-sig

newer
Question about Advanced Debugger...

Python: Different native runtime state tiers.

William Pickard

9 Jul 2020 9 Jul '20

1:46 p.m.

NOTE: This is copied straight from Python-Ideas mailing list.

CPython at the very least has 2 different type of native states: Interpreter & Module state. Unfortunately, the multi-phase initialization has a weakness when it comes to Module states.

You can't access the module state without a pointer to the module. PyState_GetModule from a standpoint looks to be the obvious answer to use, but it's documentation states it's unfit for multi-phase initialization.

I'm proposing an idea here for discussion on a new state system for at least CPython.

Tier 1: Core state

- This state lives within CPython's core binary and exists the entire

lifetime of the binary. - The data held within this state is available to the main interpreter and subsequent sub-interpreters. (Example: sys.executable) Tier 2-1: Interpreter state (Branches from Core state)

- This state lives within CPython's core binary and is tied to a specific

interpreter. - The data held within this state is only available to the interpreter it's tied to. (Example: Modules loaded into memory) Tier 2-2: Extension state (Branches from Core state)

- This state lives within any CPython's core binary EXCEPT it's size and

structure is defined by the extension CPython has loaded. - The purpose of this state is to allow an extension to hold data that can't be tied to a specific module. (Examples can be: Windows WSA, MySQL) Tier 3-1: Thread state (Branches from Interpreter State)

- This state lives within CPython's core binary and is tied to a specific

Python thread (IE: Threading library threads), - The data held within this state is only available to the thread it's tied to. (No known examples available) Tier 3-2: Module state (Branches from Interpreter State)

- This type of state is already available in CPython, explaining it is not

required.

Show replies by date

William Pickard

16 Jul 16 Jul

11:40 a.m.

Since Petr Viktorin didn't add a formal reply to this, I'm going to do so instead. They emailed me some questions which this reply will have, along with my answers for them.

Q) - What do you mean by "core binary"? - If Python is embedded into another application, what happens to the state? - If the "main" interpreter is stopped and re-initialized (Py_FinalizeEx() Py_InitializeEx()), is the state recreated? A) - The term "core binary" refers to the binary file containing the compiled result of "pythoncore". - Nothing changes other than which binary is used to allocate the object (C global variable) - Generally speaking, Py_InitializeEx()/Py_FinalizeEx() denounces the lifetime of the core binary.

Q) So, if some Python code modifies sys.executable, the change should affect all interpreters in the process? A) No, Core State can not hold any Python heap allocated objects, also, virtually all Python first party types COPIES data into newly allocated memory.

Q) - What does "Branches from Core State" mean? - How will this state be available to the interpreter it's tied to? A) - This casually means that access to the Core State can be done with just a pointer to the active Interpreter State. Note: This can be accomplished via either a constant pointer inside the Interpreter State or (Global State only) with a global C variable. - See the above answer but from the Thread State & Interpreter State branching viewpoint. Note: The active Thread State assumes we're not changing how threading works in Python, BUT it's flexible enough for thread_local storage.

Q) How is this different from C "static" storage? A) Generally speaking, it's not different at all. This is the only State definition that can be dropped all together if widely unpopular. Note: But on the flip side, we will still require an initialization/finalization for extensions to implement (see Windows WSA/MySQL documentation)

Petr Viktorin

22 Jul 22 Jul

2:54 a.m.

On 2020-07-16 20:40, William Pickard wrote:

...

Since Petr Viktorin didn't add a formal reply to this, I'm going to do so instead. They emailed me some questions which this reply will have, along with my answers for them.

Oops! I did reply privately; sorry for not including the list.

I've since written what might be a seen as a counter-proposal here: https://mail.python.org/archives/list/capi-sig@python.org/thread/6CGIIZVMJRY...

Basically, I would like to ensure that you *can* always get a pointer to the module. In that model, "Extension state" becomes unnecessary for most uses, but it can be emulated by static storage (using reference counting and locking to allow it to be used by several module objects). I consider "thread state" out of scope; that can be done with TSS, with the Py_tss_t *key in either module state or (in IMO rare cases) the above "Extension state" emulation.

...

Q) - What do you mean by "core binary"? - If Python is embedded into another application, what happens to the state? - If the "main" interpreter is stopped and re-initialized (Py_FinalizeEx() Py_InitializeEx()), is the state recreated? A) - The term "core binary" refers to the binary file containing the compiled result of "pythoncore". - Nothing changes other than which binary is used to allocate the object (C global variable) - Generally speaking, Py_InitializeEx()/Py_FinalizeEx() denounces the lifetime of the core binary.

I'm afraid I'll need to ask you to be more exact. A "binary file" doesn't really have a lifetime (unless you count "from creation until deleting from the disk", which isn't helpful). Can I assume you mean the process? You can call Py_InitializeEx()/Py_FinalizeEx() multiple times in a single process. So to be clear, which one should be the lifetime of "Core state"? A. from Py_InitializeEx() to Py_FinalizeEx() B. from first Python initialization to the process shutdown

...

Q) So, if some Python code modifies sys.executable, the change should affect all interpreters in the process? A) No, Core State can not hold any Python heap allocated objects, also, virtually all Python first party types COPIES data into newly allocated memory.

Q) - What does "Branches from Core State" mean? - How will this state be available to the interpreter it's tied to? A) - This casually means that access to the Core State can be done with just a pointer to the active Interpreter State. Note: This can be accomplished via either a constant pointer inside the Interpreter State or (Global State only) with a global C variable. - See the above answer but from the Thread State & Interpreter State branching viewpoint. Note: The active Thread State assumes we're not changing how threading works in Python, BUT it's flexible enough for thread_local storage.

Q) How is this different from C "static" storage? A) Generally speaking, it's not different at all. This is the only State definition that can be dropped all together if widely unpopular. Note: But on the flip side, we will still require an initialization/finalization for extensions to implement (see Windows WSA/MySQL documentation)

Do you have a link to that documentation? I'm not familiar enough with these to find the relevant docs myself. (To be honest I'm not sure what WSA is.)

Anyway, I would like to avoid extra initialization/finalization hooks (which seem necessary for "Extension state"), and base as much as possible on the regular PyObject* lifecycle (of the module objects). Running Python code in interpreter shutdown is hard to get right, especially if you allow references to Python objects and so you need the full GC machinery.

William Pickard

7:14 a.m.

Petr Viktorin wrote:

...

...
Q)

What do you mean by "core binary"?

If Python is embedded into another application, what happens to the state?

If the "main" interpreter is stopped and re-initialized (Py_FinalizeEx() Py_InitializeEx()), is the state recreated?

A)

The term "core binary" refers to the binary file containing the compiled result of "pythoncore".

Nothing changes other than which binary is used to allocate the object (C global variable)

Generally speaking, Py_InitializeEx()/Py_FinalizeEx() denounces the lifetime of the core binary.

I'm afraid I'll need to ask you to be more exact. A "binary file" doesn't really have a lifetime (unless you count "from creation until deleting from the disk", which isn't helpful). Can I assume you mean the process? You can call Py_InitializeEx()/Py_FinalizeEx() multiple times in a single process. So to be clear, which one should be the lifetime of "Core state"? A. from Py_InitializeEx() to Py_FinalizeEx() B. from first Python initialization to the process shutdown

The lifetime I was referring to for a binary file is from when it's loaded into memory, to the moment it's unloaded.
Since you stated that "Py_InitializeEx"/"Py_FinalizeEx()" can be invoked multiple times, Core State's lifetime is tied to those calls. (Option A)
- This also means Core State can not be a global static storage item, but a heap allocated object.

...

...
Q) So, if some Python code modifies sys.executable, the change should affect all interpreters in the process? A) No, Core State can not hold any Python heap allocated objects, also, virtually all Python first party types COPIES data into newly allocated memory. Q)

What does "Branches from Core State" mean?

How will this state be available to the interpreter it's tied to?

A) pointer to the active Interpreter State.

This casually means that access to the Core State can be done with just a
  Note: This can be accomplished via either a constant pointer inside the Interpreter
State or (Global State only) with a global C variable.

See the above answer but from the Thread State & Interpreter State branching viewpoint. Note: The active Thread State assumes we're not changing how threading works in Python, BUT it's flexible enough for thread_local storage.

Q) How is this different from C "static" storage? A) Generally speaking, it's not different at all. This is the only State definition that can be dropped all together if widely unpopular. Note: But on the flip side, we will still require an initialization/finalization for extensions to implement (see Windows WSA/MySQL documentation) Do you have a link to that documentation? I'm not familiar enough with these to find the relevant docs myself. (To be honest I'm not sure what WSA is.) Anyway, I would like to avoid extra initialization/finalization hooks (which seem necessary for "Extension state"), and base as much as possible on the regular PyObject* lifecycle (of the module objects). Running Python code in interpreter shutdown is hard to get right, especially if you allow references to Python objects and so you need the full GC machinery.

After reviewing things over, I realized Winsock and MySQL don't fit the bill as I thought they did, BUT, that doesn't say that we don't need extension specific initialization/finalization, I just haven't found a library that requires it.

1506

Age (days ago)

1519

Last active (days ago)

List overview

Download

3 comments

2 participants

participants (2)

Petr Viktorin
William Pickard

Python: Different native runtime state tiers.

William Pickard

William Pickard

Petr Viktorin

William Pickard

tags

participants (2)