Structural cleanups to the main CPython repo

I have a feature branch where I'm intermittently working on the bootstrapping changes described in PEP 432. As part of those changes, I've cleaned up a few aspects of the repo layout: * moved the main executable source file from Modules to a separate Apps directory * moved the _freezeimportlib and _testembed source files from Modules to a separate Tools directory * split the monster pythonrun.h/c pair into 3 separate header/impl pairs: * bootstrap.h/bootstrap.c * shutdown.h/shutdown.c * pythonrun.h/pythonrun.c These structural changes generally mean automatic merges touching the build machinery or the startup or shutdown code fail fairly spectacularly and need a lot of TLC to complete them without losing any changes from the main repo. Would anyone object if I went ahead and posted patches for making these changes to the main repo? I found they made the code *much* easier to follow when I started to turn the ideas in PEP 432 into working software, and implementing these shifts should make future merges to my feature branch simpler, as well as producing significantly cleaner diffs when PEP 432 gets closer to completion. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le Tue, 28 May 2013 22:15:25 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
I have a feature branch where I'm intermittently working on the bootstrapping changes described in PEP 432.
As part of those changes, I've cleaned up a few aspects of the repo layout:
* moved the main executable source file from Modules to a separate Apps directory
Sounds fine (I don't like "Apps" much, but hey :-)).
* moved the _freezeimportlib and _testembed source files from Modules to a separate Tools directory
Well, they should probably go to Apps too, no?
* split the monster pythonrun.h/c pair into 3 separate header/impl pairs: * bootstrap.h/bootstrap.c * shutdown.h/shutdown.c * pythonrun.h/pythonrun.c
I don't think separating bootstrap from shutdown is a good idea. They are quite closely related since one undoes what the other did (and they may also use shared private functions or data). I don't know what goes in the remaining "pythonrun.c", could you detail a bit? Regards Antoine.

On Tue, May 28, 2013 at 10:31 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Tue, 28 May 2013 22:15:25 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
I have a feature branch where I'm intermittently working on the bootstrapping changes described in PEP 432.
As part of those changes, I've cleaned up a few aspects of the repo layout:
* moved the main executable source file from Modules to a separate Apps directory
Sounds fine (I don't like "Apps" much, but hey :-)).
Unfortunately, I don't know any other short word for "things with main functions that we ship to end users" :)
* moved the _freezeimportlib and _testembed source files from Modules to a separate Tools directory
Well, they should probably go to Apps too, no?
I wanted to split out "part of the build/test infrastructure" from "shipped to end users", but I could also live with a simple "Bin" directory that contained both kinds of executable.
* split the monster pythonrun.h/c pair into 3 separate header/impl pairs: * bootstrap.h/bootstrap.c * shutdown.h/shutdown.c * pythonrun.h/pythonrun.c
I don't think separating bootstrap from shutdown is a good idea. They are quite closely related since one undoes what the other did (and they may also use shared private functions or data).
It was deliberate - a big part of PEP 432 is making sure that all the interpreter state lives *in* the interpreter state (as part of the config struct). Splitting the two into separate compilation modules makes it possible to ensure that all communication goes via the interpreter configuration (statics in other modules are still a problem, but also mostly out of scope for PEP 432). I *really* want to get us to clean phase separation of "the interpreter is starting up", "the interpreter is running normally" and "the interpreter is shutting down". I found that to be incredibly difficult to do when they were all intermixed in one file, which is why I decided to enlist the compiler's help by separating them.
I don't know what goes in the remaining "pythonrun.c", could you detail a bit?
While they have some of the PEP 432 changes in them, the header files in the branch give the general flavour of the separation: Bootstrap is mostly get/init type functions: https://bitbucket.org/ncoghlan/cpython_sandbox/src/ae7fef62b462fb6b559172bd4... Pythonrun is mostly PyRun_*, PyParser_*, Py_Compile* and a few other odds and ends: https://bitbucket.org/ncoghlan/cpython_sandbox/src/ae7fef62b462fb6b559172bd4... Shutdown covers the various finalisers, atexit handling, etc: https://bitbucket.org/ncoghlan/cpython_sandbox/src/ae7fef62b462fb6b559172bd4... Cheers, Nick.

On Tue, May 28, 2013 at 9:07 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Unfortunately, I don't know any other short word for "things with main functions that we ship to end users" :)
We used to call such things "programs", but that term may no longer be in popular parlance. :-) Or is it just too long? -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein

2013/5/28 Nick Coghlan <ncoghlan@gmail.com>:
On Tue, May 28, 2013 at 10:31 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Tue, 28 May 2013 22:15:25 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
I have a feature branch where I'm intermittently working on the bootstrapping changes described in PEP 432.
As part of those changes, I've cleaned up a few aspects of the repo layout:
* moved the main executable source file from Modules to a separate Apps directory
Sounds fine (I don't like "Apps" much, but hey :-)).
Unfortunately, I don't know any other short word for "things with main functions that we ship to end users" :)
"Bin" is quite common (if ironic). I think it would be fine two if that stuff was in Python/; anywhere is better than modules. (Care to move the GC, too?) -- Regards, Benjamin

28.05.13 16:07, Nick Coghlan написав(ла):
On Tue, May 28, 2013 at 10:31 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Tue, 28 May 2013 22:15:25 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
* moved the main executable source file from Modules to a separate Apps directory Sounds fine (I don't like "Apps" much, but hey :-)). Unfortunately, I don't know any other short word for "things with main functions that we ship to end users" :)
main

On Wed, May 29, 2013 at 12:03 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
28.05.13 16:07, Nick Coghlan написав(ла):
On Tue, May 28, 2013 at 10:31 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Tue, 28 May 2013 22:15:25 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
* moved the main executable source file from Modules to a separate Apps directory
Sounds fine (I don't like "Apps" much, but hey :-)).
Unfortunately, I don't know any other short word for "things with main functions that we ship to end users" :)
main
IIRC, the reason I avoided that originally was due to the potential confusion between C's main and Python's main. I don't know why I didn't think of Fred's suggestion of "Programs" - I think that contrasts nicely with Modules, so I'd like to run with that. Cleanly separating out the main functions affected the PEP 432 feature branch directly because the whole point of that PEP is to make all of them simpler by moving more of the relevant code into the shared library. However, I really *don't* want to dive into the seemingly random allocation of some things between the Python/ subdir and the Modules/ subdir . If there's a consistent pattern there, I think it may be lost somewhere back in the 20th century, as I've never been able to figure one out... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le Tue, 28 May 2013 23:07:37 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
It was deliberate - a big part of PEP 432 is making sure that all the interpreter state lives *in* the interpreter state (as part of the config struct). Splitting the two into separate compilation modules makes it possible to ensure that all communication goes via the interpreter configuration (statics in other modules are still a problem, but also mostly out of scope for PEP 432).
I *really* want to get us to clean phase separation of "the interpreter is starting up", "the interpreter is running normally" and "the interpreter is shutting down". I found that to be incredibly difficult to do when they were all intermixed in one file, which is why I decided to enlist the compiler's help by separating them.
It sounds a bit exagerated. We have encoders and decoders in the same (C) modules, compressors and decompressors ditto. Why not keep initialization and finalization in the same source file too? (how long are the resulting C files?)
I don't know what goes in the remaining "pythonrun.c", could you detail a bit?
While they have some of the PEP 432 changes in them, the header files in the branch give the general flavour of the separation:
Bootstrap is mostly get/init type functions: https://bitbucket.org/ncoghlan/cpython_sandbox/src/ae7fef62b462fb6b559172bd4...
Pythonrun is mostly PyRun_*, PyParser_*, Py_Compile* and a few other odds and ends: https://bitbucket.org/ncoghlan/cpython_sandbox/src/ae7fef62b462fb6b559172bd4...
Shutdown covers the various finalisers, atexit handling, etc: https://bitbucket.org/ncoghlan/cpython_sandbox/src/ae7fef62b462fb6b559172bd4...
The fact that PyXXX_Init() and PyXXX_Fini() end up in different header files look like a red flag to me, modularization-wise. I agree to separate PyRun_* stuff from initialization/finalization routines, though. Regards Antoine.

Am 28.05.13 18:20, schrieb Antoine Pitrou:
Le Tue, 28 May 2013 23:07:37 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
It was deliberate - a big part of PEP 432 is making sure that all the interpreter state lives *in* the interpreter state (as part of the config struct).
It sounds a bit exagerated. We have encoders and decoders in the same (C) modules, compressors and decompressors ditto. Why not keep initialization and finalization in the same source file too?
I can sympathize with the motivation. Unlike encoders and decoders, it is *very* tempting to put interpreter state into global variables. With encoders and decoders, it's clear that globals won't work if you have multiple of them. With interpreter state, it's either singletons in the first place, or the globals can be swapped out when switching interpreters. By splitting initialization and finalization into distinct translation units, you make it much more difficult to introduce new "hidden" variables. Regards, Martin

On Wed, May 29, 2013 at 2:47 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Am 28.05.13 18:20, schrieb Antoine Pitrou:
Le Tue, 28 May 2013 23:07:37 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
It was deliberate - a big part of PEP 432 is making sure that all the interpreter state lives *in* the interpreter state (as part of the config struct).
It sounds a bit exagerated. We have encoders and decoders in the same (C) modules, compressors and decompressors ditto. Why not keep initialization and finalization in the same source file too?
I can sympathize with the motivation. Unlike encoders and decoders, it is *very* tempting to put interpreter state into global variables. With encoders and decoders, it's clear that globals won't work if you have multiple of them. With interpreter state, it's either singletons in the first place, or the globals can be swapped out when switching interpreters.
By splitting initialization and finalization into distinct translation units, you make it much more difficult to introduce new "hidden" variables.
Yep, that was a key part of my motivation (the other part was also to find out what global state we *already had* by making the build blow up for anything that was static and referenced by more than just the bootstrapping code). The part I didn't think through when I did it in a long-lived branch was just how much of nightmare it was going to make any merges that touched pythonrun.h or pythonrun.c :) I'd also be open to a setup with a single "lifecycle.h" header file, which was split into the bootstrap and shutdown implementation units, since that makes it easier to check that the appropriate setup/finalize pairs exist (by looking at the combined header file), while still enlisting the build chain's assistance in avoiding hidden global state. Anway, I'll come up with some specific patches and put them on the tracker, starting with moving the source files for the binary executables and making the simpler pythonrun/lifecycle split. I can look into splitting lifecycle.c into separate bootstrap and shutdown translation units after those less controversial changes have been reviewed (the split may not even be all that practical outside the PEP 432 branch, since it would involve exposing quite a few currently static variables to the linker). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, May 29, 2013 at 12:19 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Anway, I'll come up with some specific patches and put them on the tracker, starting with moving the source files for the binary executables and making the simpler pythonrun/lifecycle split. I can look into splitting lifecycle.c into separate bootstrap and shutdown translation units after those less controversial changes have been reviewed (the split may not even be all that practical outside the PEP 432 branch, since it would involve exposing quite a few currently static variables to the linker).
I started with the simplest part, adding a new Programs directory: http://bugs.python.org/issue18093 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Am 28.05.13 15:07, schrieb Nick Coghlan:
Sounds fine (I don't like "Apps" much, but hey :-)).
Unfortunately, I don't know any other short word for "things with main functions that we ship to end users" :)
Bike-sheddingly: POSIX calls them "commands and utilities": https://www2.opengroup.org/ogsys/catalog/c436 Regards, Martin

On May 28, 2013, at 10:15 PM, Nick Coghlan wrote:
Would anyone object if I went ahead and posted patches for making these changes to the main repo?
When you say "post[ed] patches", do you mean you want to put them some place for us to review? If so, sure, go ahead of course. -Barry
participants (8)
-
"Martin v. Löwis"
-
a.cavallo@cavallinux.eu
-
Antoine Pitrou
-
Barry Warsaw
-
Benjamin Peterson
-
Fred Drake
-
Nick Coghlan
-
Serhiy Storchaka