
Hi, Serhiy Storchaka seems to be worried by the high numbers of commits in https://bugs.python.org/issue32030 "PEP 432: Rewrite Py_Main()", so let me explain the context of this work :-) To prepare CPython to implement my UTF-8 Mode PEP (PEP 540), I worked on the implementation of Nick Coghlan's PEP 432: PEP 432 -- Restructuring the CPython startup sequence https://www.python.org/dev/peps/pep-0432/ The startup sequence is a big pile of code made of multiple functions: main(), Py_Main(), Py_Initialize(), Py_Finalize()... and a lot of tiny "configuration" functions like Py_SetPath(). Over the years, many configuration options were added in the middle of the code. The priority of configuration options is not always correct between command line options, envrionment variables, configuration files (like "pyenv.cfg"), etc. For technical reasons, it's hard to impement properly the -E option (ignore PYTHON* environment variables). For example, the new PYTHONCOERCECLOCALE environment variable (of PEP 538) doesn't handle properly -E (it ignores -E), because it was too complex to support -E. -- I'm working on fixing this. Last weeks, I mostly worked on the Py_Main() function, Modules/getpath.c and PC/getpathp.c, to "refactor" the code: * Split big functions (300 to 500 lines) into multiple small functions (50 lines or less), to make it easily to follow the control flow and to allow to more easily move code * Replace static and global variables with memory allocated on the heap. * Reorganize how the configuration is read: populate a first temporary structure (_PyMain using wchar_t*), then create Python objects (_PyMainInterpreterConfig) to finish with the real configuration (like setting attributes of the sys module). The goal is to centralize all code reading configuration to fix the priority and to simplify the code. My motivation was to write a correct implementation of the UTF-8 Mode (PEP 540). Nick's motivation is to make CPython easily to embed. His plan for Python 3.8 is to give access to the new _PyCoreConfig and _PyMainInterpreterConfig structures to: * easily give access to most (if not all?) configuration options to "embedders" * allow to configure Python without environment variables, command line options, configuration files, but only using these structures * allow to configure Python using Python objects (PyObject*) rather than C types (like wchar_t*) (I'm not sure that I understood correctly, so please read the PEP 432 ;-)) IMHO the most visible change of the PEP 432 is to split Python initialization in two parts: * Core: strict minimum to use the Python C API * Main: everything else The goal is to introduce the opportunity to configure Python between Core and Main. The implementation is currently a work-in-progress. Nick will not have the bandwidth, neither do I, to update his PEP and finish the implementation, before Python 3.7. So this work remains private until at least Python 3.8. Another part of the work is to enhance the documentation. You can for example now find an explicit list of C functions which can be called before Py_Initialize(): https://docs.python.org/dev/c-api/init.html#before-python-initialization And also a list of functions that must not be called before Py_Initialize(), whereas you might want to call them :-) Victor

Currently, we have the following configuration options: typedef struct { int ignore_environment; /* -E */ int use_hash_seed; /* PYTHONHASHSEED=x */ unsigned long hash_seed; int _disable_importlib; /* Needed by freeze_importlib */ const char *allocator; /* Memory allocator: _PyMem_SetupAllocators() */ int dev_mode; /* -X dev */ int faulthandler; /* -X faulthandler */ int tracemalloc; /* -X tracemalloc=N */ int import_time; /* -X importtime */ int show_ref_count; /* -X showrefcount */ int show_alloc_count; /* -X showalloccount */ int dump_refs; /* PYTHONDUMPREFS */ int malloc_stats; /* PYTHONMALLOCSTATS */ int utf8_mode; /* -X utf8 or PYTHONUTF8 environment variable */ wchar_t *module_search_path_env; /* PYTHONPATH environment variable */ wchar_t *home; /* PYTHONHOME environment variable, see also Py_SetPythonHome(). */ wchar_t *program_name; /* Program name, see also Py_GetProgramName() */ } _PyCoreConfig; and typedef struct { int install_signal_handlers; PyObject *argv; /* sys.argv list, can be NULL */ PyObject *module_search_path; /* sys.path list */ PyObject *warnoptions; /* sys.warnoptions list, can be NULL */ PyObject *xoptions; /* sys._xoptions dict, can be NULL */ } _PyMainInterpreterConfig; Victor 2017-12-14 16:16 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:

On 12/14/2017 10:16 AM, Victor Stinner wrote:
You could have (and still could) made that a master issue with multiple dependencies. Last summer, I merged at least 20 patches for one idlelib file. I split them up among 1 master issue and about 6 dependency issues. That was essential because most of the patches were written by one of 3 new contributors and needed separate discussions about the strategy for a particular patch. I completely agree with keeping PRs to a reviewable size. -- Terry Jan Reedy

2017-12-14 22:54 GMT+01:00 Terry Reedy <tjreedy@udel.edu>:
I'm not sure that multiple issues are needed since all these changes are related to Py_Main() or are very close to Py_Main(), and they implement what is defined in the PEP 432. Technically, I could push a single giant commit, but it would be impossible to review it, even for myself, whereas I'm reading each change multiple times. I'm testing each change on Windows, macOS, Linux and FreeBSD to make sure that everything is fine. Py_Main() has a few functions specific to one platform like Windows or macOS. I also had to "iterate" on the code to move slowly the code, step by step. I'm not really proud of all these refactoring changes :-( But I hope that "at the end", the code will be much easier to understand and to maintain. Moreover, as I wrote, my intent is also to fix all the code handling configuration. For example, I just fixed the code to define sys.argv earlier. Now, sys.argv is defined very soon in Python initialization. Previously, sys.argv was only defined after Py_Initialize() completed. For example, the site module cannot access sys.argv: Traceback (most recent call last): File "/home/vstinner/prog/python/3.6/Lib/site.py", line 600, in <module> print(sys.argv) AttributeError: module 'sys' has no attribute 'argv' I'm not sure that it's useful, but I was surprised that sys was only partially initialized before the site moduel was loaded. Victor

Hi, FYI I pushed a new change related to the PEP 432: it becomes possible to skip completely the calculation of paths, especially sys.path. If you fill "Path configuration outputs" fileds of PyCoreConfig (see below), _PyPathConfig_Init() should not be called. It should be helpful for some users when Python is embedded. Sadly, this feature will not be exposed before Python 3.8, PEP 432 APIs are currently private. _PyCoreConfig structure contains most parameters (not all yet) needed by Py_Initialize(). typedef struct { int install_signal_handlers; /* Install signal handlers? -1 means unset */ int ignore_environment; /* -E, Py_IgnoreEnvironmentFlag */ int use_hash_seed; /* PYTHONHASHSEED=x */ unsigned long hash_seed; const char *allocator; /* Memory allocator: _PyMem_SetupAllocators() */ int dev_mode; /* PYTHONDEVMODE, -X dev */ int faulthandler; /* PYTHONFAULTHANDLER, -X faulthandler */ int tracemalloc; /* PYTHONTRACEMALLOC, -X tracemalloc=N */ int import_time; /* PYTHONPROFILEIMPORTTIME, -X importtime */ int show_ref_count; /* -X showrefcount */ int show_alloc_count; /* -X showalloccount */ int dump_refs; /* PYTHONDUMPREFS */ int malloc_stats; /* PYTHONMALLOCSTATS */ int coerce_c_locale; /* PYTHONCOERCECLOCALE, -1 means unknown */ int coerce_c_locale_warn; /* PYTHONCOERCECLOCALE=warn */ int utf8_mode; /* PYTHONUTF8, -X utf8; -1 means unknown */ wchar_t *program_name; /* Program name, see also Py_GetProgramName() */ int argc; /* Number of command line arguments, -1 means unset */ wchar_t **argv; /* Command line arguments */ wchar_t *program; /* argv[0] or "" */ int nxoption; /* Number of -X options */ wchar_t **xoptions; /* -X options */ int nwarnoption; /* Number of warnings options */ wchar_t **warnoptions; /* Warnings options */ /* Path configuration inputs */ wchar_t *module_search_path_env; /* PYTHONPATH environment variable */ wchar_t *home; /* PYTHONHOME environment variable, see also Py_SetPythonHome(). */ /* Path configuration outputs */ int nmodule_search_path; /* Number of sys.path paths, -1 means unset */ wchar_t **module_search_paths; /* sys.path paths */ wchar_t *executable; /* sys.executable */ wchar_t *prefix; /* sys.prefix */ wchar_t *base_prefix; /* sys.base_prefix */ wchar_t *exec_prefix; /* sys.exec_prefix */ wchar_t *base_exec_prefix; /* sys.base_exec_prefix */ /* Private fields */ int _disable_importlib; /* Needed by freeze_importlib */ } _PyCoreConfig; and typedef struct { int install_signal_handlers; /* Install signal handlers? -1 means unset */ PyObject *argv; /* sys.argv list, can be NULL */ PyObject *executable; /* sys.executable str */ PyObject *prefix; /* sys.prefix str */ PyObject *base_prefix; /* sys.base_prefix str, can be NULL */ PyObject *exec_prefix; /* sys.exec_prefix str */ PyObject *base_exec_prefix; /* sys.base_exec_prefix str, can be NULL */ PyObject *warnoptions; /* sys.warnoptions list, can be NULL */ PyObject *xoptions; /* sys._xoptions dict, can be NULL */ PyObject *module_search_path; /* sys.path list */ } _PyMainInterpreterConfig; Victor

Currently, we have the following configuration options: typedef struct { int ignore_environment; /* -E */ int use_hash_seed; /* PYTHONHASHSEED=x */ unsigned long hash_seed; int _disable_importlib; /* Needed by freeze_importlib */ const char *allocator; /* Memory allocator: _PyMem_SetupAllocators() */ int dev_mode; /* -X dev */ int faulthandler; /* -X faulthandler */ int tracemalloc; /* -X tracemalloc=N */ int import_time; /* -X importtime */ int show_ref_count; /* -X showrefcount */ int show_alloc_count; /* -X showalloccount */ int dump_refs; /* PYTHONDUMPREFS */ int malloc_stats; /* PYTHONMALLOCSTATS */ int utf8_mode; /* -X utf8 or PYTHONUTF8 environment variable */ wchar_t *module_search_path_env; /* PYTHONPATH environment variable */ wchar_t *home; /* PYTHONHOME environment variable, see also Py_SetPythonHome(). */ wchar_t *program_name; /* Program name, see also Py_GetProgramName() */ } _PyCoreConfig; and typedef struct { int install_signal_handlers; PyObject *argv; /* sys.argv list, can be NULL */ PyObject *module_search_path; /* sys.path list */ PyObject *warnoptions; /* sys.warnoptions list, can be NULL */ PyObject *xoptions; /* sys._xoptions dict, can be NULL */ } _PyMainInterpreterConfig; Victor 2017-12-14 16:16 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:

On 12/14/2017 10:16 AM, Victor Stinner wrote:
You could have (and still could) made that a master issue with multiple dependencies. Last summer, I merged at least 20 patches for one idlelib file. I split them up among 1 master issue and about 6 dependency issues. That was essential because most of the patches were written by one of 3 new contributors and needed separate discussions about the strategy for a particular patch. I completely agree with keeping PRs to a reviewable size. -- Terry Jan Reedy

2017-12-14 22:54 GMT+01:00 Terry Reedy <tjreedy@udel.edu>:
I'm not sure that multiple issues are needed since all these changes are related to Py_Main() or are very close to Py_Main(), and they implement what is defined in the PEP 432. Technically, I could push a single giant commit, but it would be impossible to review it, even for myself, whereas I'm reading each change multiple times. I'm testing each change on Windows, macOS, Linux and FreeBSD to make sure that everything is fine. Py_Main() has a few functions specific to one platform like Windows or macOS. I also had to "iterate" on the code to move slowly the code, step by step. I'm not really proud of all these refactoring changes :-( But I hope that "at the end", the code will be much easier to understand and to maintain. Moreover, as I wrote, my intent is also to fix all the code handling configuration. For example, I just fixed the code to define sys.argv earlier. Now, sys.argv is defined very soon in Python initialization. Previously, sys.argv was only defined after Py_Initialize() completed. For example, the site module cannot access sys.argv: Traceback (most recent call last): File "/home/vstinner/prog/python/3.6/Lib/site.py", line 600, in <module> print(sys.argv) AttributeError: module 'sys' has no attribute 'argv' I'm not sure that it's useful, but I was surprised that sys was only partially initialized before the site moduel was loaded. Victor

Hi, FYI I pushed a new change related to the PEP 432: it becomes possible to skip completely the calculation of paths, especially sys.path. If you fill "Path configuration outputs" fileds of PyCoreConfig (see below), _PyPathConfig_Init() should not be called. It should be helpful for some users when Python is embedded. Sadly, this feature will not be exposed before Python 3.8, PEP 432 APIs are currently private. _PyCoreConfig structure contains most parameters (not all yet) needed by Py_Initialize(). typedef struct { int install_signal_handlers; /* Install signal handlers? -1 means unset */ int ignore_environment; /* -E, Py_IgnoreEnvironmentFlag */ int use_hash_seed; /* PYTHONHASHSEED=x */ unsigned long hash_seed; const char *allocator; /* Memory allocator: _PyMem_SetupAllocators() */ int dev_mode; /* PYTHONDEVMODE, -X dev */ int faulthandler; /* PYTHONFAULTHANDLER, -X faulthandler */ int tracemalloc; /* PYTHONTRACEMALLOC, -X tracemalloc=N */ int import_time; /* PYTHONPROFILEIMPORTTIME, -X importtime */ int show_ref_count; /* -X showrefcount */ int show_alloc_count; /* -X showalloccount */ int dump_refs; /* PYTHONDUMPREFS */ int malloc_stats; /* PYTHONMALLOCSTATS */ int coerce_c_locale; /* PYTHONCOERCECLOCALE, -1 means unknown */ int coerce_c_locale_warn; /* PYTHONCOERCECLOCALE=warn */ int utf8_mode; /* PYTHONUTF8, -X utf8; -1 means unknown */ wchar_t *program_name; /* Program name, see also Py_GetProgramName() */ int argc; /* Number of command line arguments, -1 means unset */ wchar_t **argv; /* Command line arguments */ wchar_t *program; /* argv[0] or "" */ int nxoption; /* Number of -X options */ wchar_t **xoptions; /* -X options */ int nwarnoption; /* Number of warnings options */ wchar_t **warnoptions; /* Warnings options */ /* Path configuration inputs */ wchar_t *module_search_path_env; /* PYTHONPATH environment variable */ wchar_t *home; /* PYTHONHOME environment variable, see also Py_SetPythonHome(). */ /* Path configuration outputs */ int nmodule_search_path; /* Number of sys.path paths, -1 means unset */ wchar_t **module_search_paths; /* sys.path paths */ wchar_t *executable; /* sys.executable */ wchar_t *prefix; /* sys.prefix */ wchar_t *base_prefix; /* sys.base_prefix */ wchar_t *exec_prefix; /* sys.exec_prefix */ wchar_t *base_exec_prefix; /* sys.base_exec_prefix */ /* Private fields */ int _disable_importlib; /* Needed by freeze_importlib */ } _PyCoreConfig; and typedef struct { int install_signal_handlers; /* Install signal handlers? -1 means unset */ PyObject *argv; /* sys.argv list, can be NULL */ PyObject *executable; /* sys.executable str */ PyObject *prefix; /* sys.prefix str */ PyObject *base_prefix; /* sys.base_prefix str, can be NULL */ PyObject *exec_prefix; /* sys.exec_prefix str */ PyObject *base_exec_prefix; /* sys.base_exec_prefix str, can be NULL */ PyObject *warnoptions; /* sys.warnoptions list, can be NULL */ PyObject *xoptions; /* sys._xoptions dict, can be NULL */ PyObject *module_search_path; /* sys.path list */ } _PyMainInterpreterConfig; Victor
participants (2)
-
Terry Reedy
-
Victor Stinner