[Python-Dev] RFC: PEP 587 "Python Initialization Configuration": 3rd version

Thomas Wouters thomas at python.org
Thu May 16 10:10:06 EDT 2019


On Thu, May 16, 2019 at 2:03 PM Victor Stinner <vstinner at redhat.com> wrote:

> (Le jeu. 16 mai 2019 à 06:34, Gregory Szorc <gregory.szorc at gmail.com> a
> écrit :
> > > I know that the PEP is long, but well, it's a complex topic, and I
> > > chose to add many examples to make the API easier to understand.
> >
> > I saw your request for feedback on Twitter a few days back and found
> > this thread.
> >
> > This PEP is of interest to me because I'm the maintainer of PyOxidizer -
> > a project for creating single file executables embedding Python.
>
> Aha, interesting :-)
>

Just for some context to everyone: Gregory's PyOxidizer is very similar to
Hermetic Python, the thing we use at Google for all Python programs in our
mono-repo. We had a short email discussion facilitated by Augie Fackler,
who wants to use PyOxidizer for Mercurial, about how Hermetic Python works.

At the PyCon sprints last week, I sat down with Victor, Steve Dower and
Eric Snow, showing them how Hermetic Python embeds CPython, and what hoops
it has to jump through and what issues we encountered. I think most of
those issues would also apply to PyOxidizer, lthough it sounds like Gregory
solved some of the issues a bit differently. (Hermetic Python was
originally written for Python 2.7, so it doesn't try to deal with
importlib's bootstrapping, for example.)

I have some comments and questions about the PEP as well, some of which
overlap with Gregory's or Victor's answers:

[...]

> > PyPreConfig_INIT and PyConfig_INIT as macros that return a struct feel
> > weird to me. Specifically, the `PyPreConfig preconfig =
> > PyPreConfig_INIT;` pattern doesn't feel right. I'm sort of OK with these
> > being implemented as macros. But I think they should look like function
> > calls so the door is open to converting them to function calls in the
> > future.
>
> Ah yes, I noticed that some projects can only import symbols, not use
> directly the C API. You're right that such macro can be an issue.
>
> Would you be ok with a "PyConfig_Init(PyConfig *config);" function
> which would initialize all fields to theire default values? Maybe
> PyConfig_INIT should be renamed to PyConfig_STATIC_INIT.
>
> You can find a similar API for pthread mutex, there is a init function
> *and* a macro for static initialization:
>
>        int pthread_mutex_init(pthread_mutex_t *restrict mutex,
>            const pthread_mutexattr_t *restrict attr);
>
>        pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
>

This was going to be my suggestion as well: for any non-trivial macro, we
should have a function for it instead. I would also point out that PEP 587
has a code example that uses PyWideStringList_INIT, but that macro isn't
mention anywhere else. The PEP is a bit unclear as to the semantics of
PyWideStringList as a whole: the example uses a static array with length,
but doesn't explain what would happen with statically allocated data like
that if you call the Append or Extend functions. It also doesn't cover how
e.g. argv parsing would remove items from the list. (I would also suggest
the PEP shouldn't use the term 'list', at least not unqualified, if it
isn't an actual Python list.)

I understand the desire to make static allocation and initialisation
possible, but since you only need PyWideStringList for PyConfig, not
PyPreConfig (which sets the allocator), perhaps having a
PyWideStringList_Init(), which copies memory, and PyWideStringList_Clear()
to clear it, would be better?

> What about PyImport_FrozenModules? This is a global variable related to
> > Python initialization (it contains _frozen_importlib and
> > _frozen_importlib_external) but it is not accounted for in the PEP.
> > I rely on this struct in PyOxidizer to replace the importlib modules with
> > custom versions so we can do 0-copy in-memory import of Python bytecode
> > for the entirety of the standard library. Should the PyConfig have a
> > reference to the _frozen[] to use? Should the _frozen struct be made
> > part of the public API?
>
>
> First of all, PEP 587 is designed to be easily extendable :-) I added
> _config_version field to even provide backward ABI compatibility.
>
> Honestly, I never looked at PyImport_FrozenModules. It seems to fall
> into the same category than "importtab": kind of corner case use case
> which cannot be easily generalized into PyConfig structure.
>
> As I would say the same that what I wrote about
> PyImport_AppendInittab(): PyImport_FrozenModules symbol remains
> relevant and continue to work as expected. I understand that it must
> be set before the initialization, and it seems safe to set it even
> before the pre-initialization since it's a static array.
>
> Note: I renamed PyConfig._frozen to PyConfig.pathconfig_warnings: it's
> an int and it's unrelated to PyImport_FrozenModules.


>
> > I rely on this struct in PyOxidizer to replace the importlib modules with
> > custom versions so we can do 0-copy in-memory import of Python bytecode
> > for the entirety of the standard library.
>
> Wait, that sounds like a cool feature! Would it make sense to make
> this feature upstream? If yes, maybe send a separated email to
> python-dev and/or open an issue.


> > The PEP mentions a private PyConfig._install_importlib member. I'm
> > curious what this is because it may be relevant to PyOxidizer. FWIW I
> > /might/ be interested in a mechanism to better control importlib
> > initialization because PyOxidizer is currently doing dirty things at
> > run-time to register the custom 0-copy meta path importer. I /think/ my
> > desired API would be a mechanism to control the name(s) of the frozen
> > module(s) to use to bootstrap importlib. Or there would be a way to
> > register the names of additional frozen modules to import and run as
> > part of initializing importlib (before any .py-based stdlib modules are
> > imported). Then PyOxidizer wouldn't need to hack up the source code to
> > importlib, compile custom bytecode, and inject it via
> > PyImport_FrozenModules. I concede this may be out of scope for the PEP.
> > But if the API is being reworked, I'd certainly welcome making it easier
> > for tools like PyOxidizer to work their crazy module importing magic :)
>
> PEP 587 is an incomplete implementation of the PEP 432. We are
> discussing with Nick Coghlan, Steve Dower and some others about having
> 2 phases for the Python initialization: "core" and "main". The "core"
> phase would provide a bare minimum working Python: builtin exceptions
> and types, maybe builtin imports, and that's basically all. It would
> allow to configure Python using the newly created interpreter, for
> example configure Python by running Python code.
>
> The problem is that these 2 phases are not well defined yet, it's
> still under discussion. Nick and me agreed to start with PEP 587 as a
> first milestone, and see later how to implement "core" and "main"
> phases.
>
> If the private field "_init_main" of the PEP 587 is set to 0,
> Py_InitializeFromConfig() stops at the "core" phase (in fact, it's
> already implemented!). But I didn't implement yet a
> _Py_InitializeMain() function to "finish" the initialization. Let's
> say that it exists, we would get:
>
> ---
> PyConfig config = PyConfig_INIT;
> config._init_main = 0;
> PyInitError err = Py_InitializeFromConfig(&config);
> if (Py_INIT_FAILED(err)) {
>     Py_ExitInitError(err);
> }
>
> /* add your code to customize Python here */
> /* calling PyRun_SimpleString() here is safe */
>
> /* finish Python initialization */
> PyInitError err = _Py_InitializeMain(&config);
> if (Py_INIT_FAILED(err)) {
>     Py_ExitInitError(err);
> }
> ---
>
> Would it solve your use case?
>

FWIW, I understand the need here: for Hermetic Python, we solved it by
adding a new API similar to PyImport_AppendInittab, but instead registering
a generic callback hook to be called *during* the initialisation process:
after the base runtime and the import mechanism are initialised (at which
point you can create Python objects), but before *any* modules are
imported. We use that callback to insert a meta-importer that satisfies all
stdlib imports from an embedded archive. (Using a meta-importer allows us
to bypass the fileysystem altogether, even for what would otherwise be
failed path lookups.)

As I mentioned, Hermetic Python was originally written for Python 2.7, but
this approach works fine with a frozen importlib as well. The idea of
'core' and 'main' initialisation will likely work for this, as well.

Other questions/comments about PEP 587:

I really like the PyInitError struct. I would like more functions to use
it, e.g. the PyRrun_* "very high level" API, which currently calls exit()
for you on SystemExit, and returns -1 without any other information on
error. For those, I'm not entirely sure 'Init' makes sense in the name...
but I can live with it.

A couple of things are documented as performing pre-initialisation
(PyConfig_SetBytesString, PyConfig_SetBytesArgv). I understand why, but I
feel like that might be confusing and error-prone. Would it not be better
to have them fail if pre-initialisation hasn't been performed yet?

The buffered_stdio field of PyConfig mentions stdout and stderr, but not
stdin. Does it not affect stdin? (Many of the fields could do with a bit
more explicit documentation, to be honest.)

The configure_c_stdio field of PyConfig sounds like it might not set
sys.stdin/stdout/stderr. That would be new behaviour, but configure_c_stdio
doesn't have an existing equivalence, so I'm not sure if that's what you
meant or not.

The dll_path field of PyConfig says "Windows only". Does that meant the
struct doesn't have that field except in a Windows build? Or is it ignored,
instead? If it doesn't have that field at all, what #define can be used to
determine if the PyConfig struct will have it or not?

It feels a bit weird to have both 'inspect' and 'interactive' in PyConfig.
Is there a substantive difference between them? Is this just so you can
easily tell if any of run_module / run_command / run_filename are set?

"module_search_path_env" sounds like an awkward and somewhat misleading
name for the translation of PYTHONPATH. Can we not just use, say,
pythonpath_env? I expect the intended audience to know that PYTHONPATH !=
sys.path.

The module_search_paths field in PyConfig doesn't mention if it's setting
or adding to the calculated sys.path. As a whole, the path-calculation bits
are a bit under-documented. Since this is an awkward bit of CPython, it
wouldn't hurt to mention what "the default path configuration" does (i.e.
search for python's home starting at program_name, add fixed subdirs to it,
etc.)

Path configuration is mentioned as being able to issue warnings, but it
doesn't mention *how*. It can't be the warnings module at this stage. I
presume it's just printing to stderr.

Regarding Py_RunMain(): does it do the right thing when something calls
PyErr_Print() with SystemExit set? (I mentioned last week that
PyErr_Print() will call C's exit() in that case, which is obviously
terrible for embedders.)

Regarding isolated_mode and the site module, should we make stronger
guarantees about site.py's behaviour being optional? The problem with site
is that it does four things that aren't configurable, one of which is
usually very desirable, one of which probably doesn't matter to embedders,
and two that are iffy: sys.path deduplication and canonicalisation (and
fixing up __file__/__cached__ attributes of already-imported modules);
adding site-packages directories; looking for and importing
sitecustomize.py; executing .pth files. The site module doesn't easily
allow doing only some of these. (user-site directories are an exception, as
they have their own flag, so I'm not listing that here.) With Hermetic
Python we don't care about any of these (for a variety of different
reasons), but I'm always a little worried that future Python versions would
add behaviour to site that we *do* need.

(As a side note, here's an issue I forgot to talk about last week: with
Hermetic Python's meta-importers we have an ancillary regular import hook
for correctly dealing with packages with modified __path__, so that for
example 'xml' from the embedded stdlib zip can still import '_xmlplus' from
the filesystem or a separate zip, and append its __path__ entries to its
own. To do that, we use a special prefix to use for the embedded archive
meta-importers; we don't want to use a file because they are not files on
disk. The prefixes used to be something like '<embedded archive XXX at
YYY>'. This works fine, and with correct ordering of import hooks nothing
will try to find files named '<embedded archive XXX at YYY>'... until user
code imports site for some reason, which then canonicalises sys.path,
replacing the magic prefixes with '/path/to/cwd/<embedded archive XXX at
YYY>'. We've since made the magic prefixes start with /, but I'm not happy
with it :P)

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm an email virus! Think twice before sending your email to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20190516/8a8eba0b/attachment.html>


More information about the Python-Dev mailing list