New _Py_InitializeFromConfig() function (PEP 432)

Hi, I finished my work on the _PyCoreConfig structure: it's a C structure in Include/pystate.h which has many fields used to configure Python initialization. In Python 3.6 and older, these parameters were scatted around the code, and it was hard to get an exhaustive list of it. This work is linked to the Nick Coghlan's PEP 432 "Restructuring the CPython startup sequence": https://www.python.org/dev/peps/pep-0432/ Right now, the new API is still private. Nick Coghlan splitted the initialization in two parts: "core" and "main". I'm not sure that this split is needed. We should see what to do, but it would be nice to make the _PyCoreConfig API public! IMHO it's way better than the old way to configuration Python initialization. -- It is now possible to only use _PyCoreConfig to initialize Python: it overrides old ways to configure Python like environment variables (ex: PYTHONPATH), global configuration variables (ex: Py_BytesWarningFlag) and C functions (ex: Py_SetProgramName()). I added tests to test_embed on the different ways to configure Python initialization: * environment variables (ex: PYTHONPATH) * global configuration variables (ex: Py_BytesWarningFlag) and C functions (ex: Py_SetProgramName()) * _PyCoreConfig I found and fixed many issues when writing these tests :-) Reading the current configuration, _PyCoreConfig_Read(), no longer changes the configuration. Now the code to read the configuration and the code to apply the configuration is properly separated. The work is not fully complete, there are a few remaining corner cases and some parameters (ex: Py_FrozenFlag) which cannot be set by _PyCoreConfig yet. My latest issue used to work on this API: https://bugs.python.org/issue34170 I had to refactor a lot of code to implement all of that. -- The problem is that Python 3.7 got the half-baked implementation, and it caused issues: * Calling Py_Main() after Py_Initialize() fails with a fatal error on Python 3.7.0 https://bugs.python.org/issue34008 * PYTHONOPTIMIZE environment variable is ignored by Py_Initialize() https://bugs.python.org/issue34247 I fixed the first issue, I'm now working on the second one to see how it can be fixed. Other option would be to backport the code from master to the 3.7 branch, since the code in master has a way better design. But it requires to backport a lot of changes. I'm not sure yet what is the best option. Victor

On Tue, 31 Jul 2018 at 15:16 Victor Stinner <vstinner@redhat.com> wrote:
Hi,
I finished my work on the _PyCoreConfig structure: it's a C structure in Include/pystate.h which has many fields used to configure Python initialization. In Python 3.6 and older, these parameters were scatted around the code, and it was hard to get an exhaustive list of it.
That's great! Thanks for doing this! I know I have always found it hard to track down where stuff in the start-up process ultimately gets set.
This work is linked to the Nick Coghlan's PEP 432 "Restructuring the CPython startup sequence": https://www.python.org/dev/peps/pep-0432/
Right now, the new API is still private. Nick Coghlan splitted the initialization in two parts: "core" and "main". I'm not sure that this split is needed. We should see what to do, but it would be nice to make the _PyCoreConfig API public! IMHO it's way better than the old way to configuration Python initialization.
--
It is now possible to only use _PyCoreConfig to initialize Python: it overrides old ways to configure Python like environment variables (ex: PYTHONPATH), global configuration variables (ex: Py_BytesWarningFlag) and C functions (ex: Py_SetProgramName()).
I added tests to test_embed on the different ways to configure Python initialization:
* environment variables (ex: PYTHONPATH) * global configuration variables (ex: Py_BytesWarningFlag) and C functions (ex: Py_SetProgramName()) * _PyCoreConfig
I found and fixed many issues when writing these tests :-)
Reading the current configuration, _PyCoreConfig_Read(), no longer changes the configuration. Now the code to read the configuration and the code to apply the configuration is properly separated.
The work is not fully complete, there are a few remaining corner cases and some parameters (ex: Py_FrozenFlag) which cannot be set by _PyCoreConfig yet. My latest issue used to work on this API:
https://bugs.python.org/issue34170
I had to refactor a lot of code to implement all of that.
--
The problem is that Python 3.7 got the half-baked implementation, and it caused issues:
* Calling Py_Main() after Py_Initialize() fails with a fatal error on Python 3.7.0 https://bugs.python.org/issue34008 * PYTHONOPTIMIZE environment variable is ignored by Py_Initialize() https://bugs.python.org/issue34247
I fixed the first issue, I'm now working on the second one to see how it can be fixed. Other option would be to backport the code from master to the 3.7 branch, since the code in master has a way better design. But it requires to backport a lot of changes. I'm not sure yet what is the best option.
If it is not extremely painful to fix just the issue then I say don't backport the whole thing.

On Tue, Jul 31, 2018 at 4:15 PM Victor Stinner <vstinner@redhat.com> wrote:
I finished my work on the _PyCoreConfig structure:
\o/ Thanks for all the good work!
Right now, the new API is still private. Nick Coghlan splitted the initialization in two parts: "core" and "main". I'm not sure that this split is needed.
The "core" config is basically the config for the runtime. In fact, PEP 432 renamed "core" to "runtime". Please keep the firm distinction between the runtime and the (main) interpreter.
We should see what to do, but it would be nice to make the _PyCoreConfig API public! IMHO it's way better than the old way to configuration Python initialization.
+1 However, shouldn't that happen after PEP 432 is accepted?
[snip]
The problem is that Python 3.7 got the half-baked implementation, and it caused issues:
* Calling Py_Main() after Py_Initialize() fails with a fatal error on Python 3.7.0 https://bugs.python.org/issue34008 * PYTHONOPTIMIZE environment variable is ignored by Py_Initialize() https://bugs.python.org/issue34247
I fixed the first issue, I'm now working on the second one to see how it can be fixed. Other option would be to backport the code from master to the 3.7 branch, since the code in master has a way better design. But it requires to backport a lot of changes. I'm not sure yet what is the best option.
Backporting shouldn't be so risky since it's all private API and there are few other changes in the relevant code since 3.7, right? It depends on if Ned's okay with it or not. :) -eric

2018-08-02 1:18 GMT+02:00 Eric Snow <ericsnowcurrently@gmail.com>:
Backporting shouldn't be so risky since it's all private API and there are few other changes in the relevant code since 3.7, right? It depends on if Ned's okay with it or not. :)
I'm still doing further bug fixes and cleanup in the master branch: https://bugs.python.org/issue34170 I'm doing more and more changes. I just added two new files: Include/coreconfig.h and Python/coreconfig.h. IMHO it's better to put similar code in separated files. FYI Python/coreconfig.c has 954 files and Include/coreconfig.h has 319 lines. I'm ok to rename the structure and the files if you prefer a different name. About that, I'm working on a subproject: abandon Py_xxx global configuration variables to replace them with accessing interpreter->core_config->xxx. I'm not sure yet if it's a good idea or not, but it would allow to have two interpreters to have their own different configuration. Imagine two interpreters with different sys.path running in isolated mode. Or maybe an interpreter without importlib? One of the issue is that we have now two copies of the same option. For example, Py_BytesWarningFlag and interpreter->core_config->bytes_warning. That's why I would like to switch to core_config. But I'm also trying to make sure that the two variables have the same value: https://github.com/python/cpython/commit/a4d20b2e5ece2120f129cb4dda951a6c246... Victor

On Thu, Aug 2, 2018 at 3:50 AM Victor Stinner <vstinner@redhat.com> wrote:
I'm still doing further bug fixes and cleanup in the master branch: https://bugs.python.org/issue34170
I'm doing more and more changes.
Yeah, it's a question of what you plan to backport. As Barry suggested, it would be great if you had a WIP PR for the backport, just so Ned (and others) has a point of reference.
I just added two new files: Include/coreconfig.h and Python/coreconfig.h. IMHO it's better to put similar code in separated files.
FYI Python/coreconfig.c has 954 files and Include/coreconfig.h has 319 lines.
Nick might have a better opinion, particularly when in comes to a C codebase, but I'm in favor of keeping separate things separate, and especially when in relates to the runtime. (Historically we haven't been great about considering the runtime as a distinct part of CPython.) Hence +1 to keeping the runtime config separate, especially given the size of the files. :) Presumably pystate.c (and pystate.h) got smaller by roughly the same amount? Also, would it make sense for at least some of coreconfig.h to live in Include/internal, either as coreconfig.h or in the internal pystate.h?
I'm ok to rename the structure and the files if you prefer a different name.
I'd love to see all the "core" names changed to "runtime", in the same way that PEP 432 was updated. It was a point of confusion in the PEP until we changed the name, and doing so helped. I thought we had also changed the code, but apparently not. For that matter, I'd love to see PEP 432 and the codebase synced up. While the overall plan is still consistent, a number of details (e.g. the intent and content of the "core" config) have diverged.
About that, I'm working on a subproject: abandon Py_xxx global configuration variables to replace them with accessing interpreter->core_config->xxx.
+1 IMHO it's a natural consequence of having a runtime/core config. In fact, having both is problematic because it's easy for us to accidentally ignore a global-var-as-config (prior to Py_Initialize); the relationship between those global variables and runtime initialization isn't very obvious in the code (unlike the config struct). It's also confusing for embedders if we have both. As I've already expressed, I'm definitely in favor of improving encapsulation of the runtime (and moving away from globals). :) Note that there are backward compatibility issues to deal with. AFAIU if we start ignoring those global variables during initialization then it's going to cause problems for embedders. If we get rid of the variables altogether then it would break extension modules that currently rely on reading those parts of the runtime config. So I'm guessing you planned on deprecating any use of those global variables and, in the spirit of your goals for the C-API, provide a public API for extensions to access the info in the runtime config instead. FWIW, I recall that Nick and I discussed this relative to PEP 432 a while ago and remember the decision to stay with the status quo for now (to avoid scope creep in the PEP). Apparently that consideration did not get recorded in the PEP (that I could see with a quick skim of the text). The mailing lists may have the discussion somewhere.
I'm not sure yet if it's a good idea or not, but it would allow to have two interpreters to have their own different configuration. Imagine two interpreters with different sys.path running in isolated mode. Or maybe an interpreter without importlib?
Yeah, a number of interesting possibilities open up as we further encapsulate the runtime and move away from globals.
One of the issue is that we have now two copies of the same option. For example, Py_BytesWarningFlag and interpreter->core_config->bytes_warning. That's why I would like to switch to core_config.
+1
But I'm also trying to make sure that the two variables have the same value: https://github.com/python/cpython/commit/a4d20b2e5ece2120f129cb4dda951a6c246...
Yep. That is necessary while the global config variables still exist. It's just risky since it's easy for us to change the config but forget to update the global config vars (that are shadowing the config). It would probably be worth making sure we have tests that verify that the two stay synchronized. -eric

2018-08-02 17:17 GMT+02:00 Eric Snow <ericsnowcurrently@gmail.com>:
Note that there are backward compatibility issues to deal with. AFAIU if we start ignoring those global variables during initialization then it's going to cause problems for embedders.
One of the first operation of Py_Initialize(), Py_Main() and _PyCoreConfig_Read() is to get the current value of all global configuration variables. The change is more that modifying a global configuration variable after Py_Initialize() or Py_Main() may or may not have an effect. And in the future, it should no longer have effect. In short, these variables should only be read to populate the initialization configuration and then it should no longer change.
So I'm guessing you planned on deprecating any use of those global variables and, in the spirit of your goals for the C-API, provide a public API for extensions to access the info in the runtime config instead.
There is no *need* to deprecate anything. _PyCoreConfig remains fully compatible with them and there are now unit tests to make sure that their value is read at Python startup. The priority is: core config > global vars > env vars.
But I'm also trying to make sure that the two variables have the same value: https://github.com/python/cpython/commit/a4d20b2e5ece2120f129cb4dda951a6c246...
Yep. That is necessary while the global config variables still exist. It's just risky since it's easy for us to change the config but forget to update the global config vars (that are shadowing the config). It would probably be worth making sure we have tests that verify that the two stay synchronized.
If possible, I would prefer that the configuration is *not* modified after Python has been initialized. I even hesitate to mark PyInterpreterState.core_config a constant to prevent such change. The idea would be to know exactly how Python has been initialize, to make the initialization more deterministic and explicit. To come back to a concrete example: https://github.com/python/cpython/commit/a4d20b2e5ece2120f129cb4dda951a6c246... We can easily modify core_config->inspect before Python initialization. For this commit, it's just that I wanted to make tiny and incremental changes. Victor

On 2 August 2018 at 19:49, Victor Stinner <vstinner@redhat.com> wrote:
About that, I'm working on a subproject: abandon Py_xxx global configuration variables to replace them with accessing interpreter->core_config->xxx. I'm not sure yet if it's a good idea or not, but it would allow to have two interpreters to have their own different configuration. Imagine two interpreters with different sys.path running in isolated mode. Or maybe an interpreter without importlib?
One of the issue is that we have now two copies of the same option. For example, Py_BytesWarningFlag and interpreter->core_config->bytes_warning. That's why I would like to switch to core_config.
One of the challenges we have around those is the backwards compatibility implications for embedding applications, so I suspect the earliest we'll be able to make that change is in the release after the new initialisation API becomes public. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2018-08-02 1:18 GMT+02:00 Eric Snow <ericsnowcurrently@gmail.com>:
The "core" config is basically the config for the runtime. In fact, PEP 432 renamed "core" to "runtime". Please keep the firm distinction between the runtime and the (main) interpreter.
There is already something called _PyRuntime but it's shared between all interpreters. _PyCoreConfig is already *per* interpreter. Would you mind to elaborate what you mean by the "main interpreter"? I don't see anything obvious in the current code about what is a "main interpreter". Technically, I don't see anything like that. I'm still not convinced that we need _PyMainInterpreterConfig: _PyCoreConfig contains the same information. Is it really worth it to duplicate all _PyCoreConfig (more than 36 fields) in _PyMainInterpreterConfig? _PyMainInterpreterConfig adds a third copy of many paramters: another opportunity to introduce an inconsistency. Right now, an interpreter contains both: core and main configurations... I propose to *remove* _PyMainInterpreterConfig and rename _PyCoreConfig as _PyInterpreterConfig. I would also propose to merge again Py_Initialize() to have a single step instead of the current core step + main step: 2 steps. Victor

Before I dive in, I'll say that I'd really like to hear Nick's opinion on all this. :) On Thu, Aug 2, 2018 at 9:59 AM Victor Stinner <vstinner@redhat.com> wrote:
2018-08-02 1:18 GMT+02:00 Eric Snow <ericsnowcurrently@gmail.com>:
The "core" config is basically the config for the runtime. In fact, PEP 432 renamed "core" to "runtime". Please keep the firm distinction between the runtime and the (main) interpreter.
There is already something called _PyRuntime but it's shared between all interpreters.
_PyRuntime is a static global of type PyRuntimeState. It is where I consolidated (nearly) all the global runtime state last September.
_PyCoreConfig is already *per* interpreter.
This was done as part of the PEP 432 implementation, which I landed during PyCon 2017. If PyRuntimeState had existed already I'm sure it would be there instead.
Would you mind to elaborate what you mean by the "main interpreter"? I don't see anything obvious in the current code about what is a "main interpreter". Technically, I don't see anything like that.
The main interpreter is the first one created (during runtime initialization). It is special for a variety of reasons. Here are the ones I could think of: 1. the "main" thread will always belong to the main interpreter since it is the first PyThreadState created 2. runtime initialization uses the main interpreter exclusively 3. the first phase of runtime initialization (pre-initialization) ends with the main interpreter being *partially* configured 4. during the second phase (initializing), the partially-configured main interpreter facilitates the use of most of the C-API and may be used by embedders * this is the only time that an interpreter may be used in this way, and it only happens with the main interpreter 5. runtime finalization takes place using the main interpreter 6. the main interpreter is the last one destroyed during finalization 7. the REPL runs only in the main interpreter 8. the Python CLI is run in the main interpreter (i.e. in its __main__ module) 9. the main interpreter cannot be destroyed (except during finalization) 10. in Python code the main interpreter will always exist 11. it is the parent of all subinterpreters created in Python code (via PEP 554) 12. signals are handled only in the main interpreter 13. all single-threaded Python code is run in the main interpreter Note that there isn't anything special to the interpreter itself, but rather in where and how it's used. However, that matters and the runtime needs to treat it specially. I expect all this isn't well-documented because it is relevant to very few people.
I'm still not convinced that we need _PyMainInterpreterConfig:
Let's step back a moment and consider the course of events: 1. PEP 432 was created nearly 6 years ago to address the tangle that runtime initialization had become, with the intent of helping both the CPython maintainers and embedders 2. Nick did some re-organization around then (e.g. factoring out pylifecycle.c) to facilitate an implementation of the PEP 3. Nick implemented PEP 432, with a plan to merge it as a *private* API regardless of whether or not the PEP was accepted (with general consensus that doing so was a good idea) * see https://bitbucket.org/ncoghlan/cpython_sandbox/branch/pep432_modular_bootstr... * landing the private API would allow us to iron out the details of the PEP * work happened in spurts in 2013, 2015, and 2016; I kept poking Nick because the implementation was a big blocker for my multi-core/subinterpreters project 4. leading up to (and at) PyCon 2017, I forked Nick's branch, moved it to github, rebased it onto master, got it working again, created a PR, and finally landed it 5. since then the implementation has changed a bunch (due to Victor's much appreciate efforts) and has diverged from the PEP * notably it's unclear that code (especially pymain) strictly conforms to the phases in the PEP At this point the PEP is out of date. There have been several mailing list threads (all python-dev, IIRC) and some BPO issues where Victor solicited clarification or expressed a desire to change things and Nick gave feedback. None of that made it into the PEP. :( Consequently the PEP is inconsistent with the actual target. Furthermore, as was intended, we've learned of a few ways that the PEP could be improved. We *really* need to get the PEP updated so we can be sure everyone has all the info. Regarding the justification for the "main interpreter" config, the implementation has diverged from the original intent of the PEP: * the core/runtime config was meant to hold the minimal data needed to bootstrap/initialize the basic (limited) functionality of the C-API, including a restricted main interpreter + the struct members were strictly C plain-old-types since using PyObject would require the runtime to already be (partially) initialized + in the last year a lot of data has been added to this config; I don't know how much is strictly necessary to bootstrap the runtime (end of phase 1) and how much could be dealt with in phase 2 * the "main interpreter" config was meant to hold all the config needed to finish initializing the runtime (end of phase 2) + the struct members were mostly PyObject* (possible since most builtin types are available at this point) + the PEP proposes a bunch more fields than the implementation has; we planned on adding them a few at a time
_PyCoreConfig contains the same information. Is it really worth it to duplicate all _PyCoreConfig (more than 36 fields) in _PyMainInterpreterConfig? _PyMainInterpreterConfig adds a third copy of many paramters: another opportunity to introduce an inconsistency.
TBH, the PEP *should* have a clear answer for your question here, Victor. It has some explanation, but clearly it is incomplete (hence this continuing email thread). The duplication is partly a consequence of what has happened in the last year: a bunch of fields were added to the core config that were not in the PEP. However, note the key differences between the two structs: * core/runtime config + minimal + simple C fields + meant for embedders/pymain to bootstrap a limited runtime + not really meant to be used after calling Py_InitializeRuntime (AKA Py_InitializeCore) * main interpreter config + includes everything needed to finish full runtime initialization + has PyObject* fields + meant for embedders/pymain to finish initializing the runtime + not really meant to be used after calling Py_ConfigureMainInterpreter (except when initializing a subintepreter) Originally there wasn't much overlap. Furthermore, both of them are kept around so that, via the C-API (or directly in the CPython impl.), we could expose what data was used to initialize the runtime. This fills much the same role as the existing global Py_* variables. The duplication is due to there being C and PyObject versions. It is for the sake of embedders (and a little bit of sanity). The big reason why it shouldn't be a problem is because PyMainInterpreterConfig is generated directly from PyRuntimeConfig (AKA PyCoreConfig) and only *after* we've used the runtime config to bootstrap the limited runtime (after which it shouldn't be modified ever). So there's no risk of inconsistency, right? Perhaps it would make sense to only keep a const copy of both, to avoid modification?
Right now, an interpreter contains both: core and main configurations...
As noted above, the core/runtime config should probably be on PyRuntimeState instead. Regarding the "main" config, PyMainInterpreterConfig probably makes more sense as one of the following: 1. on PyRuntimeState, like the core/runtime config (since it's a one-off) 2. on PyInterpreterState, like now, but set to NULL on all but the main interpreter (which would allow us to distinguish the main interpreter from the rest) Both would require PyInterpreterConfig from PEP 432, but expanded to cover all config that might be unique to an interpreter. Also, conceptually there's a different between the-config-used-to-finish-runtime-init and the config-used-to-initialize-an-interpreter (including the main interpreter). In fact, PEP 432 does include a PyInterpreterConfig. However, in the current implementation, PyMainInterpreterConfig fills that role exclusively, which is confusing since we use the "main interpreter" config to initialize all interpreters (not just the main one). So here's what might make sense to do: 1. rename "core" to "runtime" (to reduce confusion) 2. move PyInterpreterState.runtime_config to PyRuntimeState.config + prevent modification after Py_InitializeRuntime() is called (e.g. keep a const copy)? 3. move PyInterpreterState.config to PyRuntimeConfig.main_config + prevent modification after Py_ConfigureMainInterpreter() is called (e.g. keep a const copy)? + keep the PyMainInterpreterConfig and Py_ConfigureMainInterpreter names 4. add PyInterpreterConfig with only the parts of PyMainInterpreterConfig needed to initialize any interpreter + add Py_NewInterpreterEx(PyInterpreterConfig) to allow explicitly passing a config? 5. add PyInterpreterState.config (type PyInterpreterConfig) to record the config used to initialize that interpreter + prevent modification after the interpreter is initialized (e.g. keep a const copy)?
I propose to *remove* _PyMainInterpreterConfig and rename _PyCoreConfig as _PyInterpreterConfig. I would also propose to merge again Py_Initialize() to have a single step instead of the current core step + main step: 2 steps.
So you are not in favor of PEP 432 then. :) -eric

It seems like the PEP 432 proposes an API designed from scratch as the target API. I started from the 28 years old C code and I tried to cleanup the code. Our method is different, so it's not surprising that the result is different :-) My intent is to get: * a function to read *all* configuration with no side effect and put it into a single structure: _PyCoreConfig * modify Py_Main() and all variants of Py_Initialize() to always end in the same code path using _PyCoreConfig I'm open to change to move the current implementation closer to the PEP 432. But it seems like I don't understand well the subtle parts of this PEP.
* core/runtime config + minimal
I'm not sure of what you mean by "minimum". I collected *all* parameters need to initialize Python and there are something like 40 parameters or more. But _PyCoreConfig has enough parameters to initialize a full Python with a working importlib and a REPL, for example. Where do you put the limit for "minimal"?
* main interpreter config + includes everything needed to finish full runtime initialization
For practical reason, I prefer to be able to pass the "path configuration" at the C level to be able to initialize importlib. IMHO it makes the current code base simpler, since the path computation is fully implemented in C. For example, it allows embedders to use a fixed sys.path in C. IMHO a good example is to imagine a Python runtime with *no* filesystem access, where everything is built into the binary. So we have to skip completly the code computing the path configuration, since this operating access the filesystem as well! Maybe something should be changed here?
The duplication is due to there being C and PyObject versions. It is for the sake of embedders (and a little bit of sanity). The big reason why it shouldn't be a problem is because PyMainInterpreterConfig is generated directly from PyRuntimeConfig (AKA PyCoreConfig) and only *after* we've used the runtime config to bootstrap the limited runtime (after which it shouldn't be modified ever). So there's no risk of inconsistency, right?
Currently, core_config and main_config can be modified, as global variables: some parameters exist in 3 versions, each is modified. And it's unclear which one has the highest priority. For example, if we decide to always rely on core_config, we have to modify the C code to not longer access Py_VerboseFlag after Py_Initialize(). I'm talking about the current C code, not your theorical API.
Perhaps it would make sense to only keep a const copy of both, to avoid modification?
Maybe. But currently, some flags are modified after Py_Initialize(), especially Py_InspectFlag in main.c. I would prefer to keep a read-only configuration to reflect what sys.flags contains and know how Python has been initialized.
As noted above, the core/runtime config should probably be on PyRuntimeState instead.
For me it doesn't make sense to put all _PyCoreConfig parameters into PyRuntimeState. PyRuntimeState seems to be a singleton, so it means that all interepreters would have the same configuration. Whereas I like the idea of having a different verbose and/or sys.path per intepreter. Or maybe I misunderstood what you mean by "core config". I'm talking about the _current_ _PyCoreConfig in master. Victor

On 4 August 2018 at 08:37, Victor Stinner <vstinner@redhat.com> wrote:
It seems like the PEP 432 proposes an API designed from scratch as the target API. I started from the 28 years old C code and I tried to cleanup the code. Our method is different, so it's not surprising that the result is different :-)
No, you didn't start from the 28 year old C code - you started from the private initial implementation of PEP 432, as per https://www.python.org/dev/peps/pep-0432/#implementation-strategy (and PEP 432 in turn was born from the Python 3.3 changes needed to integrate importlib properly into the __main__ initialisation process) Eric and I merged that [1], because it become apparent while I was working on the core settings management framework that actually migrating individual settings needed to happen in-tree, as the alternative of continuing to maintain it out of tree would at best result in an enormous unreviewable patch, and more likely never result in a patch at all (as the churn rate on CPython was high enough to cause regular conflicts even with the framework branch, let alone once we started migrating individual settings). The current private API doesn't meet some of the original design goals of the PEP, but the time to reconcile that (and figure out whether it's the API design or the implementation that should change) is while we're updating and reviewing the PEP against the current implementation - while everything remained private, it didn't make sense to throw any additional roadblocks in the way of the excellent work you were doing. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 3 August 2018 at 01:59, Victor Stinner <vstinner@redhat.com> wrote:
2018-08-02 1:18 GMT+02:00 Eric Snow <ericsnowcurrently@gmail.com>:
The "core" config is basically the config for the runtime. In fact, PEP 432 renamed "core" to "runtime". Please keep the firm distinction between the runtime and the (main) interpreter.
There is already something called _PyRuntime but it's shared between all interpreters.
_PyCoreConfig is already *per* interpreter.
Would you mind to elaborate what you mean by the "main interpreter"? I don't see anything obvious in the current code about what is a "main interpreter". Technically, I don't see anything like that.
I'm still not convinced that we need _PyMainInterpreterConfig: _PyCoreConfig contains the same information. Is it really worth it to duplicate all _PyCoreConfig (more than 36 fields) in _PyMainInterpreterConfig? _PyMainInterpreterConfig adds a third copy of many paramters: another opportunity to introduce an inconsistency. Right now, an interpreter contains both: core and main configurations...
The issue is massive scope creep in the "core config": currently you need to fully specify everything in order to even use the builtin data structs. That's not the design goal of PEP 432: the idea is to have an absolutely bare minimum set of settings that gives a working C API, but won't let you access the host operating system. I wasn't reigning you in on it because there were real problems to be solved in getting the multi-stage start up to work at all given the current code structure. Now that it's working though, we should be looking to move settings back out of coreconfig, and reducing the amount of startup work that needs to be done using raw C code that can't rely on the CPython C API. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jul 31, 2018, at 15:14, Victor Stinner <vstinner@redhat.com> wrote:
I finished my work on the _PyCoreConfig structure: it's a C structure in Include/pystate.h which has many fields used to configure Python initialization. In Python 3.6 and older, these parameters were scatted around the code, and it was hard to get an exhaustive list of it.
Great work Victor! +1 for making _PyCoreConfig public, although I’m sure you’re only proposing that for Python 3.8, not in any future backport.
I had to refactor a lot of code to implement all of that.
The problem is that Python 3.7 got the half-baked implementation, and it caused issues:
* Calling Py_Main() after Py_Initialize() fails with a fatal error on Python 3.7.0 https://bugs.python.org/issue34008 * PYTHONOPTIMIZE environment variable is ignored by Py_Initialize() https://bugs.python.org/issue34247
I fixed the first issue, I'm now working on the second one to see how it can be fixed. Other option would be to backport the code from master to the 3.7 branch, since the code in master has a way better design. But it requires to backport a lot of changes. I'm not sure yet what is the best option.
Do you have WIP branch for the backport? I agree that it’s probably low enough risk given the private nature of the API in 3.7, but that it’s up to Ned to decide. -Barry

On 1 August 2018 at 08:14, Victor Stinner <vstinner@redhat.com> wrote:
Hi,
I finished my work on the _PyCoreConfig structure: it's a C structure in Include/pystate.h which has many fields used to configure Python initialization. In Python 3.6 and older, these parameters were scatted around the code, and it was hard to get an exhaustive list of it.
This work is linked to the Nick Coghlan's PEP 432 "Restructuring the CPython startup sequence": https://www.python.org/dev/peps/pep-0432/
Right now, the new API is still private. Nick Coghlan splitted the initialization in two parts: "core" and "main". I'm not sure that this split is needed.
It is, because one of the aims is to make it clear when frozen bytecode (ala importlib) and other builtin modules can start to be used as part of the configuration process. That point is when the core configuration completes. By contrast, main interpreter configuration is only complete when you have full filesystem access as well (including external imports). That separation also means that an embedding application can choose *not* to proceed to the second step, and thus limit imports to the modules built in to the application.
We should see what to do, but it would be nice to make the _PyCoreConfig API public! IMHO it's way better than the old way to configuration Python initialization.
Step 1 will be to bring PEP 432 up to date with what actually happened - we learned a lot from your and Eric's efforts in actually implementing the draft design as a private API, but the current PEP still reflects the original design concept. Thanks for all your work on this! It's exciting to see it finally coming to fruition :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (5)
-
Barry Warsaw
-
Brett Cannon
-
Eric Snow
-
Nick Coghlan
-
Victor Stinner