RFC: PEP 587 "Python Initialization Configuration": 2nd version
Hi,
Thanks to Steve Dower's feedback, I enhanced and completed my PEP 587.
Main changes:
* It is now possible to read the configuration and then modify the
read configuration. For example, new directories can be added to
PyConfig.module_search_paths (see the example below and the example in
the PEP)
* PyConfig is now "dynamic" by default: strings are duplicated and
PyConfig_Clear() must be called to release memory
* PyConfig now only uses wchar_t* for strings (unicode): char* (bytes)
is no longer used. I had to hack CPython internals for that :-)
* I added a "_config_version" private field to PyPreConfig and
PyConfig to prepare the backward compatibility for future changes.
* I removed the Open Question section: all known issues have been fixed.
During the Language Summit, Brett Cannon said that Steve Dower
declined the offer to be the BDFL-delegate for this PEP. Thomas
Wouters proposed himself to be the new BDFL-delegate.
Example to read the configuration, append a directory to sys.path
(module_search_paths) and then initialize Python with this
configuration:
void init_python(void)
{
PyInitError err;
PyConfig config = PyConfig_INIT;
err = PyConfig_Read(&config);
if (_Py_INIT_FAILED(err)) {
goto fail;
}
err = PyWideStringList_Append(&config.module_search_paths,
L"/path/to/more/modules");
if (_Py_INIT_FAILED(err)) {
goto fail;
}
err = Py_InitializeFromConfig(&config);
if (_Py_INIT_FAILED(err)) {
goto fail;
}
PyConfig_Clear(&config);
return;
fail:
PyConfig_Clear(&config);
Py_ExitInitError(err);
}
The HTML version will be online shortly:
https://www.python.org/dev/peps/pep-0587/
Full text below.
Victor
PEP: 587
Title: Python Initialization Configuration
Author: Nick Coghlan
On Thursday, May 02, 2019 Victor Stinner
According to this
* ``run_command`` (``wchar_t*``): ``-c COMMAND`` argument * ``run_filename`` (``wchar_t*``): ``python3 SCRIPT`` argument * ``run_module`` (``wchar_t*``): ``python3 -m MODULE`` argument
this
``-c COMMAND`` ``run_module = COMMAND`` should read "run_command = COMMAND". Typo, not?
Le jeu. 2 mai 2019 à 16:20, Edwin Zimmerman
``-c COMMAND`` ``run_module = COMMAND`` should read "run_command = COMMAND". Typo, not?
Oops, you're right: it's a typo. Now fixed: ``-c COMMAND`` ``run_command = COMMAND`` Victor -- Night gathers, and now my watch begins. It shall not end until my death.
2019年5月3日(金) 4:59 Victor Stinner
* PyConfig now only uses wchar_t* for strings (unicode): char* (bytes) is no longer used. I had to hack CPython internals for that :-)
I prefer char* to wchar_t* on Unix. Since UTF-8 dominated Unix world
in these decades, wchar_t* is less usable on Unix nowadays.
Is it impossible to use just char* on Unix and wchar_t* on Windows?
--
Inada Naoki
Hi INADA-san,
This PEP is the result of 2 years of refactoring to *simplify* the
*implementation*. I agree that bytes string is the native type on Unix.
But. On Windows, Unicode is the native type. On Python 3, Unicode is the
native type. One key of the simplified implementation is the unique
PyConfig structure. It means that all platforms have to use the same types.
I love the idea of using only wchar_t* for PyConfig because it makes Python
initialization more reliable. The question of the encoding used to decode
byte strings and any possible decoding error (very unlikely thanks to
surrogateescape) is better defined: it occurs when you set the parameter,
not "later during init".
The PEP adds Py_UnixMain() for most trivial use cases, and
PyConfig_DecodeLocale() and PyConfig_SetArgs() for more advanced cases.
Victor
Le samedi 4 mai 2019, Inada Naoki
2019年5月3日(金) 4:59 Victor Stinner
: * PyConfig now only uses wchar_t* for strings (unicode): char* (bytes) is no longer used. I had to hack CPython internals for that :-)
I prefer char* to wchar_t* on Unix. Since UTF-8 dominated Unix world in these decades, wchar_t* is less usable on Unix nowadays.
Is it impossible to use just char* on Unix and wchar_t* on Windows?
-- Inada Naoki
-- Night gathers, and now my watch begins. It shall not end until my death.
Hi, First of all, I just found an old issue that we will solved by my PEP 587 :-) Add Py_SetFatalErrorAbortFunc: Allow embedding program to handle fatal errors https://bugs.python.org/issue30560 I studied code of applications embedding Python. Most of them has to decode bytes strings to get wchar_t* to set home, argv, program name, etc. I'm not sure that they use the "correct" encoding, especially since Python 3.7 got UTF-8 Mode (PEP 540) and C locale coercion (PEP 538). I tried to convert the source code of each project into pseudo-code which looks like C code used in CPython. I removed all error handling code: look at each reference, the original code is usually way more complex. Some project has to wrap each function of the Python C API manually, which adds even more boilerplate code. Some project set/unset environment varaibles. Others prefer global configuration variables like Py_NoSiteFlag. It seems like Py_FrozenFlag is commonly used. Maybe I should make the flag public and try to find it a better name: /* If greater than 0, suppress _PyPathConfig_Calculate() warnings. If set to -1 (default), inherit Py_FrozenFlag value. */ int _frozen; About pyinstaller which changes C standard stream buffering: Py_Initialize() now also does that when buffered_stdio=0. See config_init_stdio() in Python/coreconfig.c. Moreover, this function now *always* set standard streams to O_BINARY mode on Windows. I'm not sure if it's correct or not. Blender ------- Pseudo-code of BPY_python_start:: BLI_strncpy_wchar_from_utf8(program_path_wchar, BKE_appdir_program_path()); Py_SetProgramName(program_path_wchar); PyImport_ExtendInittab(bpy_internal_modules); Py_SetPythonHome(py_path_bundle_wchar); Py_SetStandardStreamEncoding("utf-8", "surrogateescape"); Py_NoSiteFlag = 1; Py_FrozenFlag = 1; Py_Initialize(); Ref: https://git.blender.org/gitweb/gitweb.cgi/blender.git/blob/HEAD:/source/blen... fontforge --------- Pseudo-code of fontforge when Python is used to run a script:: Py_Initialize() for init_file in init_files: PyRun_SimpleFileEx(init_file) exitcode = Py_Main(arg, argv) Py_Finalize() exit(exitcode) Ref: https://bugs.python.org/issue36204#msg337256 py2app ------ Pseudo-code:: unsetenv("PYTHONOPTIMIZE"); unsetenv("PYTHONDEBUG"); unsetenv("PYTHONDONTWRITEBYTECODE"); unsetenv("PYTHONIOENCODING"); unsetenv("PYTHONDUMPREFS"); unsetenv("PYTHONMALLOCSTATS"); setenv("PYTHONDONTWRITEBYTECODE", "1", 1); setenv("PYTHONUNBUFFERED", "1", 1); setenv("PYTHONPATH", build_python_path(), 1); setlocale(LC_ALL, "en_US.UTF-8"); mbstowcs(w_program, c_program, PATH_MAX+1); Py_SetProgramName(w_program); Py_Initialize() argv_new[0] = _Py_DecodeUTF8_surrogateescape(script, strlen(script)); ... PySys_SetArgv(argc, argv_new); PyRun_SimpleFile(fp, script); Py_Finalize(); Ref: https://bitbucket.org/ronaldoussoren/py2app/src/default/py2app/apptemplate/s... See also: https://bitbucket.org/ronaldoussoren/py2app/src/default/py2app/bundletemplat... OpenOffice ---------- Pseudo-code of ``PythonInit``:: mbstowcs(wide, home, PATH_MAX + 1); Py_SetPythonHome(wide); setenv("PYTHONPATH", getenv("PYTHONPATH") + ":" + path_bootstrap); PyImport_AppendInittab("pyuno", PyInit_pyuno); Py_DontWriteBytecodeFlag = 1; Py_Initialize(); Ref: pyuno/source/loader/pyuno_loader.cxx, see: https://docs.libreoffice.org/pyuno/html/pyuno__loader_8cxx_source.html vim --- Pseudo-code:: mbstowcs(py_home_buf, p_py3home); Py_SetPythonHome(py_home_buf); PyImport_AppendInittab("vim", Py3Init_vim); Py_Initialize(); Ref: https://github.com/vim/vim/blob/master/src/if_python3.c pyinstaller ----------- Pseudo-code:: pyi_locale_char2wchar(progname_w, status->archivename) SetProgramName(progname_w); pyi_locale_char2wchar(pyhome_w, status->mainpath) SetPythonHome(pyhome_w); pypath_w = build_path(); Py_SetPath(pypath_w); Py_NoSiteFlag = 1; Py_FrozenFlag = 1; Py_DontWriteBytecodeFlag = 1; Py_NoUserSiteDirectory = 1; Py_IgnoreEnvironmentFlag = 1; Py_VerboseFlag = 0; Py_OptimizeFlag = 1; if (unbuffered) { #ifdef _WIN32 _setmode(fileno(stdin), _O_BINARY); _setmode(fileno(stdout), _O_BINARY); #endif setbuf(stdin, (char *)NULL); setbuf(stdout, (char *)NULL); setbuf(stderr, (char *)NULL); } Py_Initialize(); PySys_SetPath(pypath_w); PySys_SetArgvEx(argc, wargv, 0); Ref: https://github.com/pyinstaller/pyinstaller/blob/1844d69f5aa1d64d3feca912ed16... Victor
On 10May2019 1832, Victor Stinner wrote:
Hi,
First of all, I just found an old issue that we will solved by my PEP 587 :-)
Add Py_SetFatalErrorAbortFunc: Allow embedding program to handle fatal errors https://bugs.python.org/issue30560
Yes, this should be a feature of any redesigned embedding API.
I studied code of applications embedding Python. Most of them has to decode bytes strings to get wchar_t* to set home, argv, program name, etc. I'm not sure that they use the "correct" encoding, especially since Python 3.7 got UTF-8 Mode (PEP 540) and C locale coercion (PEP 538).
Unless you studied Windows-only applications embedding Python, _all_ of them will have had to decode strings into Unicode, since that's what our API expects. All of the Windows-only applications I know of that embed Python are closed source, and none are owned by Red Hat. I'm going to assume you missed that entire segment of the ecosystem :) But it also seems like perhaps we just need to expose a single API that does "decode this like CPython would" so that they can call it? We don't need a whole PEP or a widely publicised and discussed redesign of embedding to add this, and since it would solve a very real problem then we should just do it.
I tried to convert the source code of each project into pseudo-code which looks like C code used in CPython.
Thanks, this is helpful! My take: * all the examples are trying to be isolated from the system Python install (except Vim?) * all the examples want to import some of their own modules before running user code * nobody understands how to configure embedded Python :) Also from my own work with/on other projects: * embedders need to integrate native thread management with Python threads * embedders want to use their own files/libraries * embedders want to totally override getpath, not augment/configure it Cheers, Steve
)Le lun. 13 mai 2019 à 18:28, Steve Dower
My take: * all the examples are trying to be isolated from the system Python install (except Vim?)
"Isolation" means different things: * ignore configuration files * ignore environment variables * custom path configuration (sys.path, sys.executable, etc.) It seems like the most common need is to have a custom path configuration. Py_IsolatedFlag isn't used. Only py2app manually ignores a few environment variables.
* all the examples want to import some of their own modules before running user code
Well, running code between Py_Initialize() and running the final Python code is not new, and my PEP doesn't change anything here: it's still possible, as it was previously. You can use PyRun_SimpleFile() after Py_Initialize() for example. Maybe I misunderstood your point.
* nobody understands how to configure embedded Python :)
Well, that's the problem I'm trying to solve by designing an homogeneous API, rather than scattered global configuration variables, environment variables, function calls, etc.
Also from my own work with/on other projects: * embedders need to integrate native thread management with Python threads
Sorry, I see the relationship with the initialization.
* embedders want to use their own files/libraries
That's the path configuration, no?
* embedders want to totally override getpath, not augment/configure it
On Python 3.7, Py_SetPath() is the closest thing to configure path configuration. But I don't see how to override sys.executable (Py_GetProgramFullPath), sys.prefix, sys.exec_prefix, nor (internal) dll_path. In the examples that I found, SetProgramName(), SetPythonHome() and Py_SetPath() are called. My PEP 587 allows to completely ignore getpath.c/getpath.c easily by setting explicitly: * use_module_search_path, module_search_paths * executable * prefix * exec_prefix * dll_path (Windows only) If you set these fields, you fully control where Python looks for modules. Extract of the C code: /* Do we need to calculate the path? */ if (!config->use_module_search_paths || (config->executable == NULL) || (config->prefix == NULL) #ifdef MS_WINDOWS || (config->dll_path == NULL) #endif || (config->exec_prefix == NULL)) { _PyInitError err = _PyCoreConfig_CalculatePathConfig(config); if (_Py_INIT_FAILED(err)) { return err; } } OpenOffice doesn't bother with complex code, it just appends a path to PYTHONPATH: setenv("PYTHONPATH", getenv("PYTHONPATH") + ":" + path_bootstrap); It can use PyWideStringList_Append(&config.module_search_paths, path_bootstrap), as shown in one example of my PEP. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
In response to all of your responses: No need to take offense, I was merely summarising the research you posted in a way that looks more like scenarios or requirements. It's a typical software engineering task. Being able to collect snippets and let people draw their own conclusions is one thing, but those of us (including yourself) who are actively working in this area are totally allowed to present our analysis as well. Given the raw material, the summary, and the recommendations, anyone else can do the same analysis and join the discussion, and that's what we're doing. But you can't simply present raw material and assume that people will naturally end up at the same conclusion - that's how you end up with overly simplistic plans where everyone "agrees" because they projected their own opinions into it, then are surprised when it turns out that other people had different opinions. Cheers, Steve On 13May2019 1452, Victor Stinner wrote:
)Le lun. 13 mai 2019 à 18:28, Steve Dower
a écrit : My take: * all the examples are trying to be isolated from the system Python install (except Vim?)
"Isolation" means different things:
* ignore configuration files * ignore environment variables * custom path configuration (sys.path, sys.executable, etc.)
It seems like the most common need is to have a custom path configuration.
Py_IsolatedFlag isn't used. Only py2app manually ignores a few environment variables.
* all the examples want to import some of their own modules before running user code
Well, running code between Py_Initialize() and running the final Python code is not new, and my PEP doesn't change anything here: it's still possible, as it was previously. You can use PyRun_SimpleFile() after Py_Initialize() for example.
Maybe I misunderstood your point.
* nobody understands how to configure embedded Python :)
Well, that's the problem I'm trying to solve by designing an homogeneous API, rather than scattered global configuration variables, environment variables, function calls, etc.
Also from my own work with/on other projects: * embedders need to integrate native thread management with Python threads
Sorry, I see the relationship with the initialization.
* embedders want to use their own files/libraries
That's the path configuration, no?
* embedders want to totally override getpath, not augment/configure it
On Python 3.7, Py_SetPath() is the closest thing to configure path configuration. But I don't see how to override sys.executable (Py_GetProgramFullPath), sys.prefix, sys.exec_prefix, nor (internal) dll_path.
In the examples that I found, SetProgramName(), SetPythonHome() and Py_SetPath() are called.
My PEP 587 allows to completely ignore getpath.c/getpath.c easily by setting explicitly:
* use_module_search_path, module_search_paths * executable * prefix * exec_prefix * dll_path (Windows only)
If you set these fields, you fully control where Python looks for modules. Extract of the C code:
/* Do we need to calculate the path? */ if (!config->use_module_search_paths || (config->executable == NULL) || (config->prefix == NULL) #ifdef MS_WINDOWS || (config->dll_path == NULL) #endif || (config->exec_prefix == NULL)) { _PyInitError err = _PyCoreConfig_CalculatePathConfig(config); if (_Py_INIT_FAILED(err)) { return err; } }
OpenOffice doesn't bother with complex code, it just appends a path to PYTHONPATH:
setenv("PYTHONPATH", getenv("PYTHONPATH") + ":" + path_bootstrap);
It can use PyWideStringList_Append(&config.module_search_paths, path_bootstrap), as shown in one example of my PEP.
Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On 10May2019 1832, Victor Stinner wrote:
I studied code of applications embedding Python. Most of them has to decode bytes strings to get wchar_t* to set home, argv, program name, etc. I'm not sure that they use the "correct" encoding, especially since Python 3.7 got UTF-8 Mode (PEP 540) and C locale coercion (PEP 538).
It looks like Py_DecodeLocale() is available very early on - why wouldn't we recommend using this function? It seems to be nearly a drop-in replacement for mbtowcs in the samples, and if memory allocation is a big deal perhaps we could just add a version that writes to a buffer? That would provide a supported workaround for the encoding issues and unblock people hitting trouble right now, yes? Cheers, Steve
On Tue, May 14, 2019, 19:52 Steve Dower
It looks like Py_DecodeLocale() is available very early on - why wouldn't we recommend using this function? It seems to be nearly a drop-in replacement for mbtowcs in the samples, and if memory allocation is a big deal perhaps we could just add a version that writes to a buffer?
Actually, it is recommended in the docs https://docs.python.org/3/c-api/init.html#c.Py_SetPythonHome Sebastian
participants (5)
-
Edwin Zimmerman
-
Inada Naoki
-
Sebastian Koslowski
-
Steve Dower
-
Victor Stinner