[Python-Dev] RFC: PEP 587 "Python Initialization Configuration": 3rd version
Victor Stinner
vstinner at redhat.com
Mon May 20 07:09:28 EDT 2019
Hi Gregory,
IMHO your remarks are not directly related to the PEP 587 and can be
addressed in parallel.
> It sounds like PyOxidizer and Hermetic Python are on the same page and
> we're working towards a more official solution. But I want to make sure
> by explicitly stating what PyOxidizer is doing.
>
> Essentially, to facilitate in-memory import, we need to register a
> custom sys.meta_path importer *before* any file-based imports are
> attempted. (...)
> I /think/ providing a 2-phase
> initialization that stops between _Py_InitializeCore() and
> _Py_InitializeMainInterpreter() would get the job done for PyOxidizer
> today. (...)
Extract of PEP 587: "This extracts a subset of the API design from the
PEP 432 development and refactoring work that is now considered
sufficiently stable to make public (allowing 3rd party embedding
applications access to the same configuration APIs that the native
CPython CLI is now using)."
We know that my PEP 587 is incomplete, but the work will continue in
Python 3.9 to support your use case.
The PEP 587 introduces an experimental separation between "core" and
"main" initialization phases. PyConfig._init_main=0 stops at the
"core" phase, then you are free to run C and Python,
_Py_InitializeMain() finishes the Python initialization ("main"
phase).
> > In the "Isolate Python" section, I suggest to set the "isolated"
> > parameter to 1 which imply setting user_site_directory to 0. So
> > sys.path isn't modified afterwards. What you pass to PyConfig is what
> > you get in sys.path in this case.
>
> Regarding site.py, I agree it is problematic for embedding scenarios.
> Some features of site.py can be useful. Others aren't. It would be
> useful to have more granular control over which bits of site.run are
> run. My naive suggestion would be to add individual flags to control
> which functions site.py:main() runs. That way embedders can cherry-pick
> site.py features without having to manually import the module and call
> functions within. That feels much more robust for long-term maintainability.
I agree that more work can be done on the site module. IMHO core
features which are needed by everybody should be done before calling
site. Maybe using a frozen "presite" module or whatever. I would be
interested to make possible to use Python for most cases without the
site module.
> Regarding Python calling exit(), this is problematic for embedding
> scenarios.
I am working on that. I fixed dozens of functions. For example,
Py_RunMain() should not longer exit if there is an uncaught SystemExit
when calling PyErr_Print(). SystemExit is now handled separately
before calling PyErr_Print(). The work is not done, but it should be
way better than Python 3.6 and 3.7 state.
> This thread called attention to exit() during interpreter
> initialization. But it is also a problem elsewhere. For example,
> PyErr_PrintEx() will call Py_Exit() if the exception is a SystemExit.
> There's definitely room to improve the exception handling mechanism to
> give embedders better control when SystemExit is raised. As it stands,
> we need to check for SystemExit manually and reimplement
> _Py_HandleSystemExit() to emulate its behavior for e.g. exception value
> handling (fun fact: you can pass non-None, non-integer values to
> sys.exit/SystemExit).
I don't know well these functions, maybe new functions are needed. It
can be done without/outside the PEP 587.
> Having a more efficient member lookup for BuiltinImporter and
> FrozenImporter might shave off a millisecond or two from startup. This
> would require some kind of derived data structure. (...)
I don't think that the structures/API to define frozen/builtin modules
has to change. We can convert these lists into an hash table during
the initialization of the importlib module.
I'm not saying that the current API is perfect, just that IMHO it can
be solved without the API.
> Unfortunately, as long as there is a global data structure that can be mutated any time
> (the API contract doesn't prohibit modifying these global arrays after
> initialization), you would need to check for "cache invalidation" on
> every lookup, undermining performance benefits.
Do you really expect an application modifying these lists dynamically?
Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
More information about the Python-Dev
mailing list