[Python-Dev] RFC: PEP 587 "Python Initialization Configuration": 3rd version

Thu May 16 08:02:49 EDT 2019

(Le jeu. 16 mai 2019 à 06:34, Gregory Szorc <gregory.szorc at gmail.com> a écrit :
> > I know that the PEP is long, but well, it's a complex topic, and I
> > chose to add many examples to make the API easier to understand.
>
> I saw your request for feedback on Twitter a few days back and found
> this thread.
>
> This PEP is of interest to me because I'm the maintainer of PyOxidizer -
> a project for creating single file executables embedding Python.

Aha, interesting :-)

> As part
> of hacking on PyOxidizer, I will admit to grumbling about the current
> state of the configuration and initialization mechanisms. The reliance
> on global variables and the haphazard way in which you must call certain
> functions before others was definitely a bit frustrating to deal with.

Yeah, that's what I tried to explain in the PEP 587 Rationale.

> My most important piece of feedback is: thank you for tackling this!
> Your work to shore up the inner workings of interpreter state and
> management is a big deal on multiple dimensions. I send my sincere
> gratitude.

You're welcome ;-)

> PyPreConfig_INIT and PyConfig_INIT as macros that return a struct feel
> weird to me. Specifically, the `PyPreConfig preconfig =
> PyPreConfig_INIT;` pattern doesn't feel right. I'm sort of OK with these
> being implemented as macros. But I think they should look like function
> calls so the door is open to converting them to function calls in the
> future.

Ah yes, I noticed that some projects can only import symbols, not use
directly the C API. You're right that such macro can be an issue.

Would you be ok with a "PyConfig_Init(PyConfig *config);" function
which would initialize all fields to theire default values? Maybe
PyConfig_INIT should be renamed to PyConfig_STATIC_INIT.

You can find a similar API for pthread mutex, there is a init function
*and* a macro for static initialization:

       int pthread_mutex_init(pthread_mutex_t *restrict mutex,
           const pthread_mutexattr_t *restrict attr);

       pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

> PyPreConfig.allocator being a char* seems a bit weird. Does this imply
> having to use strcmp() to determine which allocator to use? Perhaps the
> allocator setting should be an int mapping to a constant instead?

Yes, _PyMem_SetupAllocators() uses strcmp(). There are 6 supported values:

* "default"
* "debug"
* "pymalloc"
* "pymalloc_debug"
* "malloc"
* "malloc_debug"

Note: pymalloc and pymalloc_debug are not supported if Python is
explicitly configure using --without-pymalloc.

I think that I chose to use string because the feature was first
implemented using an environment variable.

Actually, I *like* the idea of avoiding string in PyPreConfig because
a string might need memory allocation, whereas the pre-initialization
is supposed to configure memory allocation :-) I will change the type
to an enum.

> Relatedly, how are custom allocators registered? e.g. from Rust, I want
> to use Rust's allocator. How would I do that in this API? Do I still
> need to call PyMem_SetAllocator()?

By default, PyPreConfig.allocator is set to NULL. In that case,
_PyPreConfig_Write() leaves the memory allocator unmodified.

As PyImport_AppendInittab() and PyImport_ExtendInittab(),
PyMem_SetAllocator() remains relevant and continue to work as
previously.

Example to set your custom allocator:
---
PyInitError err = Py_PreInitialize(NULL);
if (Py_INIT_FAILED(err)) {
    Py_ExitInitError(err);
}
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, my_cool_allocator);
---

Well, it also works in the opposite order, but I prefer to call
PyMem_SetAllocator() after the pre-initialization to make it more
explicit :-)
---
PyMem_SetAllocator(PYMEM_DOMAIN_MEM, my_cool_allocator);
PyInitError err = Py_PreInitialize(NULL);
if (Py_INIT_FAILED(err)) {
    Py_ExitInitError(err);
}
---

> I thought a point of this proposal
> was to consolidate per-interpreter config settings?

Right. But PyMem_SetAllocator() uses PyMemAllocatorDomain enum and
PyMemAllocatorEx structure which are not really "future-proof". For
example, I already replaced PyMemAllocator with PyMemAllocatorEx to
add "calloc". We might extend it later one more time to add allocator
with a specific memory alignement (even if the issue is now closed):

https://bugs.python.org/issue18835

I consider that PyMem_SetAllocator() is too specific to be added to PyPreConfig.

Are you fine with that?

> I'm a little confused about the pre-initialization functions that take
> command arguments. Is this intended to only be used for parsing the
> arguments that `python` recognizes? Presumably a custom application
> embedding Python would never use these APIs unless it wants to emulate
> the behavior of `python`? (I suppose this can be clarified in the API
> docs once this is implemented.)

Yes, Py_PreInitializeFromArgs() parses -E, -I, -X dev and -X utf8 options:
https://www.python.org/dev/peps/pep-0587/#command-line-arguments

Extract of my "Isolate Python" section:

"The default configuration is designed to behave as a regular Python.
To embed Python into an application, it's possible to tune the
configuration to better isolated the embedded Python from the system:
(...)"

https://www.python.org/dev/peps/pep-0587/#isolate-python

I wasn't sure if I should mention parse_argv=0 in this section or not.
According to what you wrote, I should :-)

*Maybe* rather than documenting how to isolate Python, we might even
provide a function for that?

void PyConfig_Isolate(PyConfig *config)
{ config->isolated = 1; config->parse_argv = 0; }

I didn't propose that because so far, I'm not sure that everybody has
the same opinion on what "isolation" means. Does it only mean ignore
environment variables? Or also ignore configuration files? What about
the path configuration?

That's why I propose to start without such opiniated
PyConfig_Isolate() function :-)

> What about PyImport_FrozenModules? This is a global variable related to
> Python initialization (it contains _frozen_importlib and
> _frozen_importlib_external) but it is not accounted for in the PEP.
> I rely on this struct in PyOxidizer to replace the importlib modules with
> custom versions so we can do 0-copy in-memory import of Python bytecode
> for the entirety of the standard library. Should the PyConfig have a
> reference to the _frozen[] to use? Should the _frozen struct be made
> part of the public API?

First of all, PEP 587 is designed to be easily extendable :-) I added
_config_version field to even provide backward ABI compatibility.

Honestly, I never looked at PyImport_FrozenModules. It seems to fall
into the same category than "importtab": kind of corner case use case
which cannot be easily generalized into PyConfig structure.

As I would say the same that what I wrote about
PyImport_AppendInittab(): PyImport_FrozenModules symbol remains
relevant and continue to work as expected. I understand that it must
be set before the initialization, and it seems safe to set it even
before the pre-initialization since it's a static array.

Note: I renamed PyConfig._frozen to PyConfig.pathconfig_warnings: it's
an int and it's unrelated to PyImport_FrozenModules.

> I rely on this struct in PyOxidizer to replace the importlib modules with
> custom versions so we can do 0-copy in-memory import of Python bytecode
> for the entirety of the standard library.

Wait, that sounds like a cool feature! Would it make sense to make
this feature upstream? If yes, maybe send a separated email to
python-dev and/or open an issue.

> The PEP mentions a private PyConfig._install_importlib member. I'm
> curious what this is because it may be relevant to PyOxidizer. FWIW I
> /might/ be interested in a mechanism to better control importlib
> initialization because PyOxidizer is currently doing dirty things at
> run-time to register the custom 0-copy meta path importer. I /think/ my
> desired API would be a mechanism to control the name(s) of the frozen
> module(s) to use to bootstrap importlib. Or there would be a way to
> register the names of additional frozen modules to import and run as
> part of initializing importlib (before any .py-based stdlib modules are
> imported). Then PyOxidizer wouldn't need to hack up the source code to
> importlib, compile custom bytecode, and inject it via
> PyImport_FrozenModules. I concede this may be out of scope for the PEP.
> But if the API is being reworked, I'd certainly welcome making it easier
> for tools like PyOxidizer to work their crazy module importing magic :)

PEP 587 is an incomplete implementation of the PEP 432. We are
discussing with Nick Coghlan, Steve Dower and some others about having
2 phases for the Python initialization: "core" and "main". The "core"
phase would provide a bare minimum working Python: builtin exceptions
and types, maybe builtin imports, and that's basically all. It would
allow to configure Python using the newly created interpreter, for
example configure Python by running Python code.

The problem is that these 2 phases are not well defined yet, it's
still under discussion. Nick and me agreed to start with PEP 587 as a
first milestone, and see later how to implement "core" and "main"
phases.

If the private field "_init_main" of the PEP 587 is set to 0,
Py_InitializeFromConfig() stops at the "core" phase (in fact, it's
already implemented!). But I didn't implement yet a
_Py_InitializeMain() function to "finish" the initialization. Let's
say that it exists, we would get:

---
PyConfig config = PyConfig_INIT;
config._init_main = 0;
PyInitError err = Py_InitializeFromConfig(&config);
if (Py_INIT_FAILED(err)) {
    Py_ExitInitError(err);
}

/* add your code to customize Python here */
/* calling PyRun_SimpleString() here is safe */

/* finish Python initialization */
PyInitError err = _Py_InitializeMain(&config);
if (Py_INIT_FAILED(err)) {
    Py_ExitInitError(err);
}
---

Would it solve your use case?

Sorry, I didn't understand properly what you mean by "controlling the
names of the frozen modules to use to bootstrap importlib".

> I really like the new Py_RunMain() API and associated PyConfig members.
> I also invented this wheel in PyOxidizer and the new API should result
> in me deleting some code that I wish I didn't have to write in the first
> place :)

Great!

> I invented a data structure for representing a Python interpreter
> configuration. And the similarities to PyConfig are striking. I think
> that's a good sign :)

He he :-)

> It might be useful to read through that file -
> especially the init function (line with `pub fn init`) to see if
> anything I'm doing pushes the boundaries of the proposed API. Feel free
> to file GitHub issues if you see obvious bugs with PyOxidizer's Python
> initialization logic while you're at it :)

Your link didn't work, but I found:
https://github.com/indygreg/PyOxidizer/blob/master/pyoxidizer/src/pyembed/pyinterp.rs

"write_modules_directory_env" seems very specific to your needs. Apart
of that, I confirm that PythonConfig is very close to PEP 587
PyConfig! I notice that you also avoided double negation, thanks ;-)

/* Pre-initialization functions we could support:
*
* PyObject_SetArenaAllocator()
* PySys_AddWarnOption()
* PySys_AddXOption()
* PySys_ResetWarnOptions()
*/

Apart PyObject_SetArenaAllocator(), PyConfig implements the 3 other functions.

Again, ss PyMem_SetAllocator(), PyObject_SetArenaAllocator() remains
relevant and can be used with the pre-initialization.

PySys_SetObject("argv", obj) is covered by PyConfig.argv.

PySys_SetObject("argvb", obj): I'm not sure why you are doing that,
it's easy to retrieve sys.argv as bytes, it's now even documented:
https://docs.python.org/dev/library/sys.html#sys.argv

---

Sorry, I'm not an importlib expert. I'm not sure what could be done in
PEP 587 for your specific importlib changes.

> Also, one thing that tripped me up a few times when writing PyOxidizer
> was managing the lifetimes of memory that various global variables point
> to. The short version is I was setting Python globals to point to memory
> allocated by Rust and I managed to crash Python by freeing memory before
> it should have been. Since the new API seems to preserve support for
> global variables, I'm curious if there are changes to how memory must be
> managed. It would be really nice to get to a state where you only need
> to ensure the PyConfig instance and all its referenced memory only needs
> to outlive the interpreter it configures. That would make the memory
> lifetimes story intuitive and easily compatible with Rust.

For the specific case of PyConfig, you have to call
PyConfig_Clear(config) after you called Py_InitializeFromConfig().
Python keeps a copy of your configuration (and it completes the
missing fields, if needed).

I modified a lot of functions to ensure that Python cleanups more
globals at exit in Py_Finalize() and at the end of Py_Main() /
Py_RunMain().

I'm not sure if it replies to your question. If you want a more
specific, can you please give more concrete examples of globals?

There is also an on-going refactoring to move globals into
_PyRuntimeState and PyInterpreterState: change needed to support
subinterpreters, see Eric Snow's PEP 554.

> One feature that I think is missing from the proposal (and this is
> related to the previous paragraph) is the ability to prevent config
> fallback to things that aren't PyConfig and PyPreConfig. There is
> `PyConfig.parse_argv` to disable command argument parsing and
> `PyConfig.use_environment` to disable environment variable fallback. But
> AFAICT there is no option to disable configuration file fallback nor
> global variable fallback.

If you embed Python, you control global configuration variables, no? I
chose to design PyConfig to inherit global configuration variables
because it allows to support both ways to configure Python using a
single implementation.

Would you prefer an explicit PyConfig_SetDefaults(config) which would
completely ignore global configuration variables?

See Lib/test/test_embed.py unit tests which uses Programs/_testembed.c:
https://github.com/python/cpython/blob/master/Programs/_testembed.c

python._pth (Windows only), pybuilddir.txt (Unix only) and pyvenv.cfg
configuration files are only used by the function building the "Path
Configuration".

Using PEP 587, you can now completely ignore this function:
https://www.python.org/dev/peps/pep-0587/#path-configuration

> Again, this proposal is terrific overall and so much better than what we
> have today. The wall of text I just wrote is disproportionate in size to
> the quality of the PEP. I almost feel bad writing so much feedback for
> such a terrific PEP ;)
>
> Excellent work, Victor. I can't wait to see these changes materialize!

Thanks :-)

Thanks for your very interesting feedback. It's really helpful to see
how the API is used "for real" :-)

Victor