[Python-Dev] RFC: PEP 587 "Python Initialization Configuration": 3rd version

Thu May 16 13:25:06 EDT 2019

Le jeu. 16 mai 2019 à 16:10, Thomas Wouters <thomas at python.org> a écrit :
>> Would you be ok with a "PyConfig_Init(PyConfig *config);" function
>> which would initialize all fields to theire default values? Maybe
>> PyConfig_INIT should be renamed to PyConfig_STATIC_INIT.
>>
>> You can find a similar API for pthread mutex, there is a init function
>> *and* a macro for static initialization:
>>
>>        int pthread_mutex_init(pthread_mutex_t *restrict mutex,
>>            const pthread_mutexattr_t *restrict attr);
>>
>>        pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
>
>
> This was going to be my suggestion as well: for any non-trivial macro, we should have a function for it instead.

Ok, I will do that.

> I would also point out that PEP 587 has a code example that uses PyWideStringList_INIT, but that macro isn't mention anywhere else.

Oh, I forgot to better document it. Well, the macro is trivial:

#define _PyWstrList_INIT (_PyWstrList){.length = 0, .items = NULL}

For consistency, I prefer to not initialize manually these fields, but
use a macro instead.

(Variables are allocated on the stack and so *must* be initialized.)

> The PEP is a bit unclear as to the semantics of PyWideStringList as a whole: the example uses a static array with length, but doesn't explain what would happen with statically allocated data like that if you call the Append or Extend functions. It also doesn't cover how e.g. argv parsing would remove items from the list. (I would also suggest the PEP shouldn't use the term 'list', at least not unqualified, if it isn't an actual Python list.)

Calling PyWideStringList_Append() or PyWideStringList_Insert() on a
"constant" list will crash: don't do that :-)

I tried to explain the subtle details of "constant" vs "dynamic"
configurations in "Initialization with constant PyConfig" and "Memory
allocations and Py_DecodeLocale()" functions.

A "constant" PyWideStringList must not be used with a "dynamic"
PyConfig: otherwise, PyConfig_Clear() will crash as well.

I would prefer to have separated "const PyWideStringList" and "const
PyConfig" types, but the C language doesn't convert "wchat_*" to
"const wchar_t*" when you do that. We would need duplicated
PyConstantWideStringList and PyConstantConfig structures, which would
require to be "casted" to PyWideStringList and PyConfig internally to
reuse the same code for constant and dynamic configuration.

If you consider that the specific case of "constant configuration"
adds too much burden / complexity, we mgiht remove it and always
require to use dynamic configuration.

Right now, Programs/_testembed.c almost uses only "constant"
configuration. Using dynamic memory would make the code longer: need
to handle memory allocation failures.

> I understand the desire to make static allocation and initialisation possible, but since you only need PyWideStringList for PyConfig, not PyPreConfig (which sets the allocator), perhaps having a PyWideStringList_Init(), which copies memory, and PyWideStringList_Clear() to clear it, would be better?

Do you mean to always require to build dynamic lists? Said
differently, not allow to write something like the following code?

    static wchar_t* argv[] = {
        L"python3",
        L"-c",
        L"pass",
        L"arg2",
    };

    _PyCoreConfig config = _PyCoreConfig_INIT;
    config.argv.length = Py_ARRAY_LENGTH(argv);
    config.argv.items = argv;

>> If the private field "_init_main" of the PEP 587 is set to 0,
>> Py_InitializeFromConfig() stops at the "core" phase (in fact, it's
>> already implemented!). But I didn't implement yet a
>> _Py_InitializeMain() function to "finish" the initialization. Let's
>> say that it exists, we would get:
>>
>> ---
>> PyConfig config = PyConfig_INIT;
>> config._init_main = 0;
>> PyInitError err = Py_InitializeFromConfig(&config);
>> if (Py_INIT_FAILED(err)) {
>>     Py_ExitInitError(err);
>> }
>>
>> /* add your code to customize Python here */
>> /* calling PyRun_SimpleString() here is safe */
>>
>> /* finish Python initialization */
>> PyInitError err = _Py_InitializeMain(&config);
>> if (Py_INIT_FAILED(err)) {
>>     Py_ExitInitError(err);
>> }
>> ---
>>
>> Would it solve your use case?
>
>
> FWIW, I understand the need here: for Hermetic Python, we solved it by adding a new API similar to PyImport_AppendInittab, but instead registering a generic callback hook to be called *during* the initialisation process: after the base runtime and the import mechanism are initialised (at which point you can create Python objects), but before *any* modules are imported. We use that callback to insert a meta-importer that satisfies all stdlib imports from an embedded archive. (Using a meta-importer allows us to bypass the fileysystem altogether, even for what would otherwise be failed path lookups.)
>
> As I mentioned, Hermetic Python was originally written for Python 2.7, but this approach works fine with a frozen importlib as well. The idea of 'core' and 'main' initialisation will likely work for this, as well.

Well, even if it's not part of the PEP 587, I just implemented it
anyway while fixing a bug:
https://github.com/python/cpython/commit/9ef5dcaa0b3c7c7ba28dbb3ec0c9507d9d05e3a9

Example:

static int test_init_main(void)
{
    _PyCoreConfig config = _PyCoreConfig_INIT;
    configure_init_main(&config);
    config._init_main = 0;

    _PyInitError err = _Py_InitializeFromConfig(&config);
    if (_Py_INIT_FAILED(err)) {
        _Py_ExitInitError(err);
    }

    /* sys.stdout don't exist yet: it is created by _Py_InitializeMain() */
    int res = PyRun_SimpleString(
        "import sys; "
        "print('Run Python code before _Py_InitializeMain', "
               "file=sys.stderr)");
    if (res < 0) {
        exit(1);
    }

    err = _Py_InitializeMain();
    if (_Py_INIT_FAILED(err)) {
        _Py_ExitInitError(err);
    }

    return _Py_RunMain();
}

As you can see, it's possible execute Python between "core" and "main"
initialization phases. Moreover, I even fixed Python to be able to use
"import sys" before the "main" initialization phase ;-) (Only builtin
and frozen modules are available at this stage.)

Again, I'm not comfortable to make PyConfig._init_main and
_Py_InitializeMain() public, because I consider that they are too
experimental and we don't have enough time to discuss what is the
"core" initialization phase exactly.

> Other questions/comments about PEP 587:
>
> I really like the PyInitError struct. I would like more functions to use it, e.g. the PyRrun_* "very high level" API, which currently calls exit() for you on SystemExit, and returns -1 without any other information on error. For those, I'm not entirely sure 'Init' makes sense in the name... but I can live with it.

PyInitError structure can be renamed PyError, but it should only be
used with functions which can exit Python. In short, are you talking
"The Very High Level Layer" of the C API?
https://docs.python.org/dev/c-api/veryhigh.html

One issue is that I dislike adding new functions to the C API, but it
seems like we should add a few to provide a better API for embedded
Python. libpython must never exit the process! (or only when you
explicity asks that :-))

Note: PyRun_SimpleStringFlags() is a wrapper which makes
PyRun_StringFlags() usage easier. PyRun_StringFlags() doesn't handle
the exception and so let you decide how to handle it.

> A couple of things are documented as performing pre-initialisation (PyConfig_SetBytesString, PyConfig_SetBytesArgv). I understand why, but I feel like that might be confusing and error-prone. Would it not be better to have them fail if pre-initialisation hasn't been performed yet?

It's easier to modify the code to fail with an error if Python is not
pre-initialized.

I propose to implicitly pre-initialize Python to make the API easier
to use. In practice, you rarely have to explicitly pre-initialize
Python. The default PyPreConfig is just fine for almost all use cases,
especially since Nick Coghlan and me decided to disable C locale
coercion and UTF-8 Mode by default. You now have to opt-in to enable
these encoding features.

> The buffered_stdio field of PyConfig mentions stdout and stderr, but not stdin. Does it not affect stdin?

Extract of create_stdio():

    /* stdin is always opened in buffered mode, first because it shouldn't
       make a difference in common use cases, second because TextIOWrapper
       depends on the presence of a read1() method which only exists on
       buffered streams.
    */

Note: Unbuffered stdin doesn't magically make the producer on the
other side of a pipe flushing its (stdout/stderr) buffer more
frequently :-)

> (Many of the fields could do with a bit more explicit documentation, to be honest.)

Well, 2 years ago, almost no configuration parameter was documented
:-) I helped to document "Global configuration variables" at:
https://docs.python.org/dev/c-api/init.html#global-configuration-variables

I had to reverse engineer the code to be able to document it :-D

Right now, my reference documentation lives in
Include/cpython/coreconfig.h. Some parameters are better documented
there, than in the PEP. I can try to enhance the documentation in the
PEP.

> The configure_c_stdio field of PyConfig sounds like it might not set sys.stdin/stdout/stderr. That would be new behaviour, but configure_c_stdio doesn't have an existing equivalence, so I'm not sure if that's what you meant or not.

In Python 3.7, only Py_Main() configured C standard streams.

I moved the code into _PyCoreConfig_Write() which is called by
_Py_InitializeFromConfig() and so by Py_Initialize() as well.

My intent is to be able to get the same behavior using Py_Initialize()
+ Py_RunMain(), than using Py_Main().

Said differently, Python 3.8 now always configures C standard streams.
Maybe I should modify the configure_c_stdio default value to 0, and
only enable it by default in Py_Main()?

Honestly, I'm a little bit confused here. I'm not sure what is the
expected behavior. Usually, in case of doubt, I look at the behavior
before my refactoring. The old behaviour was that only Py_Main()
configured C standard streams. Maybe I should restore this behavior.

But to build a customized Python which should behave as the regular
Python, you would like opt-in for configure_c_stdio=1.

Maybe we need a function to set the configuration to get "regular
Python" behavior?

Something like: PyConfig_SetRegularPythonBehavior()? (sorry for the silly name!)

> The dll_path field of PyConfig says "Windows only". Does that meant the struct doesn't have that field except in a Windows build? Or is it ignored, instead? If it doesn't have that field at all, what #define can be used to determine if the PyConfig struct will have it or not?

The field doesn't exist on non-Windows platforms.

I chose to expose it to let the developer chooses where Python looks for DLL.

But Steve just said (in an email below) that there is no reason to
make it configurable. In that case, I will make it internal again. It
seems like I misunderstood the purpose of this parameter.

> It feels a bit weird to have both 'inspect' and 'interactive' in PyConfig. Is there a substantive difference between them? Is this just so you can easily tell if any of run_module / run_command / run_filename are set?

In Python 3.7, there are Py_InspectFlag and Py_InteractiveFlag.

If "interactive" parameter is non-zero, C standard streams are
configured as buffered. It is also used to decide if stdin is
considered as interactive or not:

/* Return non-zero is stdin is a TTY or if -i command line option is used */
static int
stdin_is_interactive(const _PyCoreConfig *config)
{
    return (isatty(fileno(stdin)) || config->interactive);
}

The "inspect" parameter is used to decide if we start a REPL or not.

The "-i" command line option sets inspect (Py_InspectFlag) and
interactive (Py_InteractiveFlag) to 1.

These flags are exposed at Python level as sys.flags.inspect and
sys.flags.interactive.

... Honestly, I'm not sure if there is a real difference between these
two flags, but they are exposed and exist for years... so I decided to
keep them.

> "module_search_path_env" sounds like an awkward and somewhat misleading name for the translation of PYTHONPATH. Can we not just use, say, pythonpath_env? I expect the intended audience to know that PYTHONPATH != sys.path.

Sure, I can rename it.

> The module_search_paths field in PyConfig doesn't mention if it's setting or adding to the calculated sys.path. As a whole, the path-calculation bits are a bit under-documented.

Py_InitializeFromConfig() sets sys.path from module_search_paths.

sys.path doesn't exist before Py_InitializeFromConfig() is called.

> Since this is an awkward bit of CPython, it wouldn't hurt to mention what "the default path configuration" does (i.e. search for python's home starting at program_name, add fixed subdirs to it, etc.)

Oh, that's a big task :-) Nobody knows what getpath.c and getpathp.c do :-D

> Path configuration is mentioned as being able to issue warnings, but it doesn't mention *how*. It can't be the warnings module at this stage. I presume it's just printing to stderr.

First, I didn't know, but I just saw that it's only on Unix
(getpath.c). On Windows (getpathp.c), no warning is emitted.

The warning is written into C stderr.

The flag isn't new: it's based on Py_FrozenFlag. When I looked at how
Python is embedded, I was surprised by the number of applications
setting Py_FrozenFlag to 1 to suppress these warnings.

> Regarding Py_RunMain(): does it do the right thing when something calls PyErr_Print() with SystemExit set? (I mentioned last week that PyErr_Print() will call C's exit() in that case, which is obviously terrible for embedders.)

I spent a significant amount of time to ensure that
Py_InitializeFromConfig() and Py_RunMain() don't exit directly, but
return a proper failure or exit code. For example, Python 3.6 contains
around 319 calls to Py_FatalError(). The master branch contains around
181 calls to Py_FatalError(): still a lot, but I converted 138 calls
to _Py_INIT_ERR() ;-)

The work is not complete: I just checked, Py_RunMain() still calls
directly PyErr_Print() at many places. Well, the code can be fixed,
and it's not directly related to the PEP, is it? The issue already
existed in Python 3.7 with Py_Main().

> Regarding isolated_mode and the site module, should we make stronger guarantees about site.py's behaviour being optional? The problem with site is that it does four things that aren't configurable, one of which is usually very desirable, one of which probably doesn't matter to embedders, and two that are iffy: sys.path deduplication and canonicalisation (and fixing up __file__/__cached__ attributes of already-imported modules); adding site-packages directories; looking for and importing sitecustomize.py; executing .pth files. The site module doesn't easily allow doing only some of these. (user-site directories are an exception, as they have their own flag, so I'm not listing that here.) With Hermetic Python we don't care about any of these (for a variety of different reasons), but I'm always a little worried that future Python versions would add behaviour to site that we *do* need.

Honestly, I would prefer to simply remove the site module, I dislike
it because it makes Python startup way slower :-) ... But well, it
does a few important things :-)

About the PEP 587: PyConfig.user_site_directory is exported as
sys.flags.no_user_site (negative value) which is used by the site
module.

I'm not sure if you are asking me to modify my PEP, or if it's more a
general remark. The PEP 587 gives control on how sys.path is
initialized.

In the "Isolate Python" section, I suggest to set the "isolated"
parameter to 1 which imply setting user_site_directory to 0. So
sys.path isn't modified afterwards. What you pass to PyConfig is what
you get in sys.path in this case.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.