Re: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence

Gah, the PEP number in the subject should, of course, be 432 (not 342). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Nick, PEP 432 is looking very nice. It'll be fun to watch the implementation come together. :) Some comments... The start up sequences:
* Pre-Initialization - no interpreter available * Initialization - interpreter partially available
What about "Initializing"?
What is "optional" about this state? Maybe it should be called "Operational"?
... separate system Python (spython) executable ...
I love the idea, but I'm not crazy about the name. What about `python-minimal` (yes, it's deliberately longer. Symlinks ftw. :)
<TBD: Did I miss anything?>
What about sys.implementation?
as it failed to be updated for the virtual environment support added in Python 3.3 (detailed in PEP 420).
venv is defined in PEP 405 (there are two cases of mis-referencing). Note that there may be other important build time settings on some platforms. An example is Debian/Ubuntu, where we define the multiarch triplet in the configure script, and pass that through Makefile(.pre.in) to sysmodule.c for exposure as sys.implementation._multiarch.
For a command executed with -c, it will be the string "-c" For explicitly requested input from stdin, it will be the string "-"
Wow, I couldn't believe it but it's true! That seems crazy useless. :)
As I've mentioned before on the python-porting mailing list, this is actually more difficult than it seems because main() takes char*s but Py_SetArgv() and Py_SetProgramName() takes wchar_t*s. Maybe Python's own conversion could be refactored to make this easier either as part of this PEP or after the PEP is implemented.
int Py_ReadConfiguration(PyConfig *config);
So why not define this to take a PyObject* or a PyDictObject* ? (also: the Py_Config struct members need the correct concrete type pointers, e.g. PyDictObject*)
How will setting something after Py_ReadConfiguration() is called change a value such as sys.path? Or is this the reason why you pass a Py_Config to Py_EndInitialization()? (also, see the type typo <wink> in the definition of Py_EndInitialization()) Also, I suggest taking the opportunity to change the sense of flags such as no_site and dont_write_bytecode. I find it much more difficult to reason that "dont_write_bytecode = 0" means *do* write bytecode, rather than "write_bytecode = 1". I.e. positives are better than double-negatives.
sys.argv[0] may not yet have its final value it will be -m when executing a module or package with CPython
Gosh, wouldn't it be nice if this could have a more useful value?
Initial thought is that hiding the various options behind a single API would make that API too complicated, so 3 separate APIs is more likely:
+1
Couldn't it just have a dict with all the values from both structs collapsed into it?
For debugging purposes, the configuration settings will be exposed as a sys._configuration simple namespace
I suggest un-underscoring the name and making it public. It might be useful for other than debugging purposes.
Is Py_IsRunningMain() worth keeping?
Perhaps. Does it provide any additional information above Py_IsInitialized()?
Should the answers to Py_IsInitialized() and Py_RunningMain() be exposed via the sys module?
I can't think of a use case.
Is the Py_Config struct too unwieldy to be practical? Would a Python dictionary be a better choice?
Although I see why you've spec'd it this way, I don't like having *two* config structures (Py_CoreConfig and Py_Config). Having a dictionary for the latter would probably be fine, and in fact you could copy the Py_Config values into it (when possible during the init sequence) and expose it in the sys module.
Would we even notice the optimization?
A System Python Executable
This should probably at least mention Christian's idea of the -I flag (which I think hasn't been PEP'd yet). We can bikeshed about the name of the executable later. :) Cheers, -Barry

On 1/5/2013 4:42 PM, Barry Warsaw wrote:
IE, you prefer positive flags, with some on by default, over having all flags indicate a non-default condition. I would too, but I don't hack on the C code base. 'dont_write_bytecode' is especially ugly. In any case, this seems orthogonal to Nick's PEP and should be a separate discussion (on pydev), tracker issue, and patch. Is the current tradition just happenstance or something that some of the major C developers strongly care about? -- Terry Jan Reedy

On Sun, Jan 6, 2013 at 9:54 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Would it be less ugly if called 'suppress_bytecode'? It sounds less negative, but does the same thing. Suppressing something is an active and positive action (though the democratic decision to not publish is quite different, as Yes Minister proved). ChrisA

On Sun, Jan 6, 2013 at 7:42 AM, Barry Warsaw <barry@python.org> wrote:
Makes sense, changed.
Unlike the other phases which are sequential and distinct, "Main Execution" is a subphase of Initialized. Embedding applications without the concept of a "__main__" module (e.g. mod_wsgi) will never use it.
Yeah, I'll go with "python-minimal".
<TBD: Did I miss anything?>
What about sys.implementation?
Unaffected, since that's all configured at build time. I've added an explicit note that sys.implementation and sysconfig.get_config_vars() are not affected by this initial proposal.
Oops, fixed.
Yeah, I don't want to mess with adding new runtime configuration options at this point, beyond the features inherent in breaking up the existing initialization phases.
Yup. While researching this PEP I had many moments where I was looking at the screen going "WTF, we seriously do that?" (most notably when I learned that using the -W and -X options means we create Python objects in Py_Main() before the call to Py_Initialize(). This is why there has to be an explicit call to _Py_Random_Init() before the option processing code)
Yeah, one of the changes in the PEP is that you can pass program_name and raw_argv as a Unicode object or a list of Unicode objects instead of use wchar_t.
That wording is a holdover from a previous version of the PEP where this was indeed a dictionary pointer. I came around to Antoine's point of view that since we have a fixed list of supported settings at any given point in time, a struct would be easier to deal with on the C side. However, I missed a few spots (including this one) when I made the change to the PEP.
(also: the Py_Config struct members need the correct concrete type pointers, e.g. PyDictObject*)
Fixed.
Correct - calling Py_ReadConfiguration has no effect on the interpreter state. The interpreter state only changes in Py_EndInitialization. I'll include a more explicit explanation of that behaviour.
While I agree with this principle in general, I'm deliberate not doing anything about most of these because these settings are already exposed in their double-negative form as environment variables (PYTHONDONTWRITEBYTECODE, PYTHONNOUSERSITE), as global variables that can be set by an embedding application (Py_DontWriteBytecodeFlag, Py_NoSiteFlag, Py_NoUserSiteDirectory) and as sys module attributes (sys.dont_write_bytecode, sys.flags.no_site, sys.flags.no_user_site). However, I *am* going to change the sense of the no_site setting to "enable_site_config". The reason for this is that the meaning of the setting actually changed in Python 3.3 to also mean "disable the side effects that are currently implicit in importing the site module", in addition to implicitly importing that module as part of the startup sequence.
It does once runpy is done with it (it has the __file__ attribute corresponding to whatever code is actually being run as __main__). At this point in the initialisation sequence, though, __main__ is still the builtin __main__ module, and there's no getting around the fact that we need to be able to import and run arbitrary Python code (both from the standard library and from package __init__ files) in order to properly locate __main__.
It could, but that's substantially less convenient from the C side of the API.
The underscore is there because the specific fields are currently CPython specific. Another implementation may not make these settings configurable at all. If there are particular settings that would be useful to modules like importlib or site, then we may want to look at exposing them through sys.implementation as required attributes, but that's a distinct PEP from this one.
Is Py_IsRunningMain() worth keeping?
Perhaps. Does it provide any additional information above Py_IsInitialized()?
Yes - it indicates that sys.argv[0] and the metadata in __main__ are fully updated (i.e. the placeholder info used while executing Python code in order to locate __main__ in the first place has been replaced with the real info).
Neither can I. I'll leave them as "for embedding apps only" until someone comes up with an actual reason to expose them.
Yeah, I originally had just Py_CoreConfig and then a Py_DictObject for the rest of it. The first draft of Py_Config embedded a copy of Py_CoreConfig as the first field. However, I eventually settled on the current scheme as best aligning the model with the reality that we really do have two kinds of configuration setting which need to be handled differently: - Py_CoreConfig holds the settings that are required to create a Py_InterpreterState at all (passed to Py_BeginInitialization) - Py_Config holds the settings that are required to get to a fully functional interpreter (passed to Py_EndInitialization) Using a struct for both of them is easier to work with from C, and makes the number vs string vs list vs mapping distinction for the various settings self-documenting.
I'll clarify this a bit - it's a maintainability question, rather than an optimization. (i.e. I think _Py_Config_INIT is ugly as hell, I just don't have any better ideas)
Yeah, I've gone through and added a bunch of tracker links, including that one. There's a signficant number of things which this should make easier in the future (e.g. I haven't linked to it, but the proposal to support custom memory allocators could be handled by adding more fields to Py_CoreConfig rather than more C level global variables) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jan 6, 2013 at 5:26 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Oops, I was editing the PEP and the email at the same time, and changed my mind about this without fixing the email. I actually went with "pysystem" for now, but I also noted the need to paint this bikeshed under Open Questions. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Nick, PEP 432 is looking very nice. It'll be fun to watch the implementation come together. :) Some comments... The start up sequences:
* Pre-Initialization - no interpreter available * Initialization - interpreter partially available
What about "Initializing"?
What is "optional" about this state? Maybe it should be called "Operational"?
... separate system Python (spython) executable ...
I love the idea, but I'm not crazy about the name. What about `python-minimal` (yes, it's deliberately longer. Symlinks ftw. :)
<TBD: Did I miss anything?>
What about sys.implementation?
as it failed to be updated for the virtual environment support added in Python 3.3 (detailed in PEP 420).
venv is defined in PEP 405 (there are two cases of mis-referencing). Note that there may be other important build time settings on some platforms. An example is Debian/Ubuntu, where we define the multiarch triplet in the configure script, and pass that through Makefile(.pre.in) to sysmodule.c for exposure as sys.implementation._multiarch.
For a command executed with -c, it will be the string "-c" For explicitly requested input from stdin, it will be the string "-"
Wow, I couldn't believe it but it's true! That seems crazy useless. :)
As I've mentioned before on the python-porting mailing list, this is actually more difficult than it seems because main() takes char*s but Py_SetArgv() and Py_SetProgramName() takes wchar_t*s. Maybe Python's own conversion could be refactored to make this easier either as part of this PEP or after the PEP is implemented.
int Py_ReadConfiguration(PyConfig *config);
So why not define this to take a PyObject* or a PyDictObject* ? (also: the Py_Config struct members need the correct concrete type pointers, e.g. PyDictObject*)
How will setting something after Py_ReadConfiguration() is called change a value such as sys.path? Or is this the reason why you pass a Py_Config to Py_EndInitialization()? (also, see the type typo <wink> in the definition of Py_EndInitialization()) Also, I suggest taking the opportunity to change the sense of flags such as no_site and dont_write_bytecode. I find it much more difficult to reason that "dont_write_bytecode = 0" means *do* write bytecode, rather than "write_bytecode = 1". I.e. positives are better than double-negatives.
sys.argv[0] may not yet have its final value it will be -m when executing a module or package with CPython
Gosh, wouldn't it be nice if this could have a more useful value?
Initial thought is that hiding the various options behind a single API would make that API too complicated, so 3 separate APIs is more likely:
+1
Couldn't it just have a dict with all the values from both structs collapsed into it?
For debugging purposes, the configuration settings will be exposed as a sys._configuration simple namespace
I suggest un-underscoring the name and making it public. It might be useful for other than debugging purposes.
Is Py_IsRunningMain() worth keeping?
Perhaps. Does it provide any additional information above Py_IsInitialized()?
Should the answers to Py_IsInitialized() and Py_RunningMain() be exposed via the sys module?
I can't think of a use case.
Is the Py_Config struct too unwieldy to be practical? Would a Python dictionary be a better choice?
Although I see why you've spec'd it this way, I don't like having *two* config structures (Py_CoreConfig and Py_Config). Having a dictionary for the latter would probably be fine, and in fact you could copy the Py_Config values into it (when possible during the init sequence) and expose it in the sys module.
Would we even notice the optimization?
A System Python Executable
This should probably at least mention Christian's idea of the -I flag (which I think hasn't been PEP'd yet). We can bikeshed about the name of the executable later. :) Cheers, -Barry

On 1/5/2013 4:42 PM, Barry Warsaw wrote:
IE, you prefer positive flags, with some on by default, over having all flags indicate a non-default condition. I would too, but I don't hack on the C code base. 'dont_write_bytecode' is especially ugly. In any case, this seems orthogonal to Nick's PEP and should be a separate discussion (on pydev), tracker issue, and patch. Is the current tradition just happenstance or something that some of the major C developers strongly care about? -- Terry Jan Reedy

On Sun, Jan 6, 2013 at 9:54 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Would it be less ugly if called 'suppress_bytecode'? It sounds less negative, but does the same thing. Suppressing something is an active and positive action (though the democratic decision to not publish is quite different, as Yes Minister proved). ChrisA

On Sun, Jan 6, 2013 at 7:42 AM, Barry Warsaw <barry@python.org> wrote:
Makes sense, changed.
Unlike the other phases which are sequential and distinct, "Main Execution" is a subphase of Initialized. Embedding applications without the concept of a "__main__" module (e.g. mod_wsgi) will never use it.
Yeah, I'll go with "python-minimal".
<TBD: Did I miss anything?>
What about sys.implementation?
Unaffected, since that's all configured at build time. I've added an explicit note that sys.implementation and sysconfig.get_config_vars() are not affected by this initial proposal.
Oops, fixed.
Yeah, I don't want to mess with adding new runtime configuration options at this point, beyond the features inherent in breaking up the existing initialization phases.
Yup. While researching this PEP I had many moments where I was looking at the screen going "WTF, we seriously do that?" (most notably when I learned that using the -W and -X options means we create Python objects in Py_Main() before the call to Py_Initialize(). This is why there has to be an explicit call to _Py_Random_Init() before the option processing code)
Yeah, one of the changes in the PEP is that you can pass program_name and raw_argv as a Unicode object or a list of Unicode objects instead of use wchar_t.
That wording is a holdover from a previous version of the PEP where this was indeed a dictionary pointer. I came around to Antoine's point of view that since we have a fixed list of supported settings at any given point in time, a struct would be easier to deal with on the C side. However, I missed a few spots (including this one) when I made the change to the PEP.
(also: the Py_Config struct members need the correct concrete type pointers, e.g. PyDictObject*)
Fixed.
Correct - calling Py_ReadConfiguration has no effect on the interpreter state. The interpreter state only changes in Py_EndInitialization. I'll include a more explicit explanation of that behaviour.
While I agree with this principle in general, I'm deliberate not doing anything about most of these because these settings are already exposed in their double-negative form as environment variables (PYTHONDONTWRITEBYTECODE, PYTHONNOUSERSITE), as global variables that can be set by an embedding application (Py_DontWriteBytecodeFlag, Py_NoSiteFlag, Py_NoUserSiteDirectory) and as sys module attributes (sys.dont_write_bytecode, sys.flags.no_site, sys.flags.no_user_site). However, I *am* going to change the sense of the no_site setting to "enable_site_config". The reason for this is that the meaning of the setting actually changed in Python 3.3 to also mean "disable the side effects that are currently implicit in importing the site module", in addition to implicitly importing that module as part of the startup sequence.
It does once runpy is done with it (it has the __file__ attribute corresponding to whatever code is actually being run as __main__). At this point in the initialisation sequence, though, __main__ is still the builtin __main__ module, and there's no getting around the fact that we need to be able to import and run arbitrary Python code (both from the standard library and from package __init__ files) in order to properly locate __main__.
It could, but that's substantially less convenient from the C side of the API.
The underscore is there because the specific fields are currently CPython specific. Another implementation may not make these settings configurable at all. If there are particular settings that would be useful to modules like importlib or site, then we may want to look at exposing them through sys.implementation as required attributes, but that's a distinct PEP from this one.
Is Py_IsRunningMain() worth keeping?
Perhaps. Does it provide any additional information above Py_IsInitialized()?
Yes - it indicates that sys.argv[0] and the metadata in __main__ are fully updated (i.e. the placeholder info used while executing Python code in order to locate __main__ in the first place has been replaced with the real info).
Neither can I. I'll leave them as "for embedding apps only" until someone comes up with an actual reason to expose them.
Yeah, I originally had just Py_CoreConfig and then a Py_DictObject for the rest of it. The first draft of Py_Config embedded a copy of Py_CoreConfig as the first field. However, I eventually settled on the current scheme as best aligning the model with the reality that we really do have two kinds of configuration setting which need to be handled differently: - Py_CoreConfig holds the settings that are required to create a Py_InterpreterState at all (passed to Py_BeginInitialization) - Py_Config holds the settings that are required to get to a fully functional interpreter (passed to Py_EndInitialization) Using a struct for both of them is easier to work with from C, and makes the number vs string vs list vs mapping distinction for the various settings self-documenting.
I'll clarify this a bit - it's a maintainability question, rather than an optimization. (i.e. I think _Py_Config_INIT is ugly as hell, I just don't have any better ideas)
Yeah, I've gone through and added a bunch of tracker links, including that one. There's a signficant number of things which this should make easier in the future (e.g. I haven't linked to it, but the proposal to support custom memory allocators could be handled by adding more fields to Py_CoreConfig rather than more C level global variables) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jan 6, 2013 at 5:26 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Oops, I was editing the PEP and the email at the same time, and changed my mind about this without fixing the email. I actually went with "pysystem" for now, but I also noted the need to paint this bikeshed under Open Questions. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (4)
-
Barry Warsaw
-
Chris Angelico
-
Nick Coghlan
-
Terry Reedy