Re: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence

Gah, the PEP number in the subject should, of course, be 432 (not 342). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Nick, PEP 432 is looking very nice. It'll be fun to watch the implementation come together. :) Some comments... The start up sequences:
* Pre-Initialization - no interpreter available * Initialization - interpreter partially available
What about "Initializing"?
* Initialized - full interpreter available, __main__ related metadata incomplete * Main Execution - optional state, __main__ related metadata populated, bytecode executing in the __main__ module namespace
What is "optional" about this state? Maybe it should be called "Operational"?
... separate system Python (spython) executable ...
I love the idea, but I'm not crazy about the name. What about `python-minimal` (yes, it's deliberately longer. Symlinks ftw. :)
<TBD: Did I miss anything?>
What about sys.implementation?
as it failed to be updated for the virtual environment support added in Python 3.3 (detailed in PEP 420).
venv is defined in PEP 405 (there are two cases of mis-referencing). Note that there may be other important build time settings on some platforms. An example is Debian/Ubuntu, where we define the multiarch triplet in the configure script, and pass that through Makefile(.pre.in) to sysmodule.c for exposure as sys.implementation._multiarch.
For a command executed with -c, it will be the string "-c" For explicitly requested input from stdin, it will be the string "-"
Wow, I couldn't believe it but it's true! That seems crazy useless. :)
Embedding applications must call Py_SetArgv themselves. The CPython logic for doing so is part of Py_Main() and is not exposed separately. However, the runpy module does provide roughly equivalent logic in runpy.run_module and runpy.run_path.
As I've mentioned before on the python-porting mailing list, this is actually more difficult than it seems because main() takes char*s but Py_SetArgv() and Py_SetProgramName() takes wchar_t*s. Maybe Python's own conversion could be refactored to make this easier either as part of this PEP or after the PEP is implemented.
int Py_ReadConfiguration(PyConfig *config);
The config argument should be a pointer to a Python dictionary. For any supported configuration setting already in the dictionary, CPython will sanity check the supplied value, but otherwise accept it as correct.
So why not define this to take a PyObject* or a PyDictObject* ? (also: the Py_Config struct members need the correct concrete type pointers, e.g. PyDictObject*)
Alternatively, settings may be overridden after the Py_ReadConfiguration call (this can be useful if an embedding application wants to adjust a setting rather than replace it completely, such as removing sys.path[0]).
How will setting something after Py_ReadConfiguration() is called change a value such as sys.path? Or is this the reason why you pass a Py_Config to Py_EndInitialization()? (also, see the type typo <wink> in the definition of Py_EndInitialization()) Also, I suggest taking the opportunity to change the sense of flags such as no_site and dont_write_bytecode. I find it much more difficult to reason that "dont_write_bytecode = 0" means *do* write bytecode, rather than "write_bytecode = 1". I.e. positives are better than double-negatives.
sys.argv[0] may not yet have its final value it will be -m when executing a module or package with CPython
Gosh, wouldn't it be nice if this could have a more useful value?
Initial thought is that hiding the various options behind a single API would make that API too complicated, so 3 separate APIs is more likely:
+1
The interpreter state will be updated to include details of the configuration settings supplied during initialization by extending the interpreter state object with an embedded copy of the Py_CoreConfig and Py_Config structs.
Couldn't it just have a dict with all the values from both structs collapsed into it?
For debugging purposes, the configuration settings will be exposed as a sys._configuration simple namespace
I suggest un-underscoring the name and making it public. It might be useful for other than debugging purposes.
Is Py_IsRunningMain() worth keeping?
Perhaps. Does it provide any additional information above Py_IsInitialized()?
Should the answers to Py_IsInitialized() and Py_RunningMain() be exposed via the sys module?
I can't think of a use case.
Is the Py_Config struct too unwieldy to be practical? Would a Python dictionary be a better choice?
Although I see why you've spec'd it this way, I don't like having *two* config structures (Py_CoreConfig and Py_Config). Having a dictionary for the latter would probably be fine, and in fact you could copy the Py_Config values into it (when possible during the init sequence) and expose it in the sys module.
Would it be better to manage the flag variables in Py_Config as Python integers so the struct can be initialized with a simple memset(&config, 0, sizeof(*config))?
Would we even notice the optimization?
A System Python Executable
This should probably at least mention Christian's idea of the -I flag (which I think hasn't been PEP'd yet). We can bikeshed about the name of the executable later. :) Cheers, -Barry

On 1/5/2013 4:42 PM, Barry Warsaw wrote:
Also, I suggest taking the opportunity to change the sense of flags such as no_site and dont_write_bytecode. I find it much more difficult to reason that "dont_write_bytecode = 0" means *do* write bytecode, rather than "write_bytecode = 1". I.e. positives are better than double-negatives.
IE, you prefer positive flags, with some on by default, over having all flags indicate a non-default condition. I would too, but I don't hack on the C code base. 'dont_write_bytecode' is especially ugly. In any case, this seems orthogonal to Nick's PEP and should be a separate discussion (on pydev), tracker issue, and patch. Is the current tradition just happenstance or something that some of the major C developers strongly care about? -- Terry Jan Reedy

On Sun, Jan 6, 2013 at 9:54 AM, Terry Reedy <tjreedy@udel.edu> wrote:
On 1/5/2013 4:42 PM, Barry Warsaw wrote:
Also, I suggest taking the opportunity to change the sense of flags such as no_site and dont_write_bytecode. I find it much more difficult to reason that "dont_write_bytecode = 0" means *do* write bytecode, rather than "write_bytecode = 1". I.e. positives are better than double-negatives.
IE, you prefer positive flags, with some on by default, over having all flags indicate a non-default condition. I would too, but I don't hack on the C code base. 'dont_write_bytecode' is especially ugly.
Would it be less ugly if called 'suppress_bytecode'? It sounds less negative, but does the same thing. Suppressing something is an active and positive action (though the democratic decision to not publish is quite different, as Yes Minister proved). ChrisA

On Sun, Jan 6, 2013 at 7:42 AM, Barry Warsaw <barry@python.org> wrote:
Hi Nick,
PEP 432 is looking very nice. It'll be fun to watch the implementation come together. :)
Some comments...
The start up sequences:
* Pre-Initialization - no interpreter available * Initialization - interpreter partially available
What about "Initializing"?
Makes sense, changed.
* Initialized - full interpreter available, __main__ related metadata incomplete * Main Execution - optional state, __main__ related metadata populated, bytecode executing in the __main__ module namespace
What is "optional" about this state? Maybe it should be called "Operational"?
Unlike the other phases which are sequential and distinct, "Main Execution" is a subphase of Initialized. Embedding applications without the concept of a "__main__" module (e.g. mod_wsgi) will never use it.
... separate system Python (spython) executable ...
I love the idea, but I'm not crazy about the name. What about `python-minimal` (yes, it's deliberately longer. Symlinks ftw. :)
Yeah, I'll go with "python-minimal".
<TBD: Did I miss anything?>
What about sys.implementation?
Unaffected, since that's all configured at build time. I've added an explicit note that sys.implementation and sysconfig.get_config_vars() are not affected by this initial proposal.
as it failed to be updated for the virtual environment support added in Python 3.3 (detailed in PEP 420).
venv is defined in PEP 405 (there are two cases of mis-referencing).
Oops, fixed.
Note that there may be other important build time settings on some platforms. An example is Debian/Ubuntu, where we define the multiarch triplet in the configure script, and pass that through Makefile(.pre.in) to sysmodule.c for exposure as sys.implementation._multiarch.
Yeah, I don't want to mess with adding new runtime configuration options at this point, beyond the features inherent in breaking up the existing initialization phases.
For a command executed with -c, it will be the string "-c" For explicitly requested input from stdin, it will be the string "-"
Wow, I couldn't believe it but it's true! That seems crazy useless. :)
Yup. While researching this PEP I had many moments where I was looking at the screen going "WTF, we seriously do that?" (most notably when I learned that using the -W and -X options means we create Python objects in Py_Main() before the call to Py_Initialize(). This is why there has to be an explicit call to _Py_Random_Init() before the option processing code)
Embedding applications must call Py_SetArgv themselves. The CPython logic for doing so is part of Py_Main() and is not exposed separately. However, the runpy module does provide roughly equivalent logic in runpy.run_module and runpy.run_path.
As I've mentioned before on the python-porting mailing list, this is actually more difficult than it seems because main() takes char*s but Py_SetArgv() and Py_SetProgramName() takes wchar_t*s.
Maybe Python's own conversion could be refactored to make this easier either as part of this PEP or after the PEP is implemented.
Yeah, one of the changes in the PEP is that you can pass program_name and raw_argv as a Unicode object or a list of Unicode objects instead of use wchar_t.
int Py_ReadConfiguration(PyConfig *config);
The config argument should be a pointer to a Python dictionary. For any supported configuration setting already in the dictionary, CPython will sanity check the supplied value, but otherwise accept it as correct.
So why not define this to take a PyObject* or a PyDictObject* ?
That wording is a holdover from a previous version of the PEP where this was indeed a dictionary pointer. I came around to Antoine's point of view that since we have a fixed list of supported settings at any given point in time, a struct would be easier to deal with on the C side. However, I missed a few spots (including this one) when I made the change to the PEP.
(also: the Py_Config struct members need the correct concrete type pointers, e.g. PyDictObject*)
Fixed.
Alternatively, settings may be overridden after the Py_ReadConfiguration call (this can be useful if an embedding application wants to adjust a setting rather than replace it completely, such as removing sys.path[0]).
How will setting something after Py_ReadConfiguration() is called change a value such as sys.path? Or is this the reason why you pass a Py_Config to Py_EndInitialization()?
Correct - calling Py_ReadConfiguration has no effect on the interpreter state. The interpreter state only changes in Py_EndInitialization. I'll include a more explicit explanation of that behaviour.
(also, see the type typo <wink> in the definition of Py_EndInitialization())
Also, I suggest taking the opportunity to change the sense of flags such as no_site and dont_write_bytecode. I find it much more difficult to reason that "dont_write_bytecode = 0" means *do* write bytecode, rather than "write_bytecode = 1". I.e. positives are better than double-negatives.
While I agree with this principle in general, I'm deliberate not doing anything about most of these because these settings are already exposed in their double-negative form as environment variables (PYTHONDONTWRITEBYTECODE, PYTHONNOUSERSITE), as global variables that can be set by an embedding application (Py_DontWriteBytecodeFlag, Py_NoSiteFlag, Py_NoUserSiteDirectory) and as sys module attributes (sys.dont_write_bytecode, sys.flags.no_site, sys.flags.no_user_site). However, I *am* going to change the sense of the no_site setting to "enable_site_config". The reason for this is that the meaning of the setting actually changed in Python 3.3 to also mean "disable the side effects that are currently implicit in importing the site module", in addition to implicitly importing that module as part of the startup sequence.
sys.argv[0] may not yet have its final value it will be -m when executing a module or package with CPython
Gosh, wouldn't it be nice if this could have a more useful value?
It does once runpy is done with it (it has the __file__ attribute corresponding to whatever code is actually being run as __main__). At this point in the initialisation sequence, though, __main__ is still the builtin __main__ module, and there's no getting around the fact that we need to be able to import and run arbitrary Python code (both from the standard library and from package __init__ files) in order to properly locate __main__.
Initial thought is that hiding the various options behind a single API would make that API too complicated, so 3 separate APIs is more likely:
+1
The interpreter state will be updated to include details of the configuration settings supplied during initialization by extending the interpreter state object with an embedded copy of the Py_CoreConfig and Py_Config structs.
Couldn't it just have a dict with all the values from both structs collapsed into it?
It could, but that's substantially less convenient from the C side of the API.
For debugging purposes, the configuration settings will be exposed as a sys._configuration simple namespace
I suggest un-underscoring the name and making it public. It might be useful for other than debugging purposes.
The underscore is there because the specific fields are currently CPython specific. Another implementation may not make these settings configurable at all. If there are particular settings that would be useful to modules like importlib or site, then we may want to look at exposing them through sys.implementation as required attributes, but that's a distinct PEP from this one.
Is Py_IsRunningMain() worth keeping?
Perhaps. Does it provide any additional information above Py_IsInitialized()?
Yes - it indicates that sys.argv[0] and the metadata in __main__ are fully updated (i.e. the placeholder info used while executing Python code in order to locate __main__ in the first place has been replaced with the real info).
Should the answers to Py_IsInitialized() and Py_RunningMain() be exposed via the sys module?
I can't think of a use case.
Neither can I. I'll leave them as "for embedding apps only" until someone comes up with an actual reason to expose them.
Is the Py_Config struct too unwieldy to be practical? Would a Python dictionary be a better choice?
Although I see why you've spec'd it this way, I don't like having *two* config structures (Py_CoreConfig and Py_Config). Having a dictionary for the latter would probably be fine, and in fact you could copy the Py_Config values into it (when possible during the init sequence) and expose it in the sys module.
Yeah, I originally had just Py_CoreConfig and then a Py_DictObject for the rest of it. The first draft of Py_Config embedded a copy of Py_CoreConfig as the first field. However, I eventually settled on the current scheme as best aligning the model with the reality that we really do have two kinds of configuration setting which need to be handled differently: - Py_CoreConfig holds the settings that are required to create a Py_InterpreterState at all (passed to Py_BeginInitialization) - Py_Config holds the settings that are required to get to a fully functional interpreter (passed to Py_EndInitialization) Using a struct for both of them is easier to work with from C, and makes the number vs string vs list vs mapping distinction for the various settings self-documenting.
Would it be better to manage the flag variables in Py_Config as Python integers so the struct can be initialized with a simple memset(&config, 0, sizeof(*config))?
Would we even notice the optimization?
I'll clarify this a bit - it's a maintainability question, rather than an optimization. (i.e. I think _Py_Config_INIT is ugly as hell, I just don't have any better ideas)
A System Python Executable
This should probably at least mention Christian's idea of the -I flag (which I think hasn't been PEP'd yet). We can bikeshed about the name of the executable later. :)
Yeah, I've gone through and added a bunch of tracker links, including that one. There's a signficant number of things which this should make easier in the future (e.g. I haven't linked to it, but the proposal to support custom memory allocators could be handled by adding more fields to Py_CoreConfig rather than more C level global variables) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jan 6, 2013 at 5:26 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I love the idea, but I'm not crazy about the name. What about `python-minimal` (yes, it's deliberately longer. Symlinks ftw. :)
Yeah, I'll go with "python-minimal".
Oops, I was editing the PEP and the email at the same time, and changed my mind about this without fixing the email. I actually went with "pysystem" for now, but I also noted the need to paint this bikeshed under Open Questions. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (4)
-
Barry Warsaw
-
Chris Angelico
-
Nick Coghlan
-
Terry Reedy