[Python-Dev] RFC: PEP 587 "Python Initialization Configuration": 3rd version

Fri May 31 23:43:36 EDT 2019

On 5/20/2019 4:09 AM, Victor Stinner wrote:
> Hi Gregory,
> 
> IMHO your remarks are not directly related to the PEP 587 and can be
> addressed in parallel.
> 
>> It sounds like PyOxidizer and Hermetic Python are on the same page and
>> we're working towards a more official solution. But I want to make sure
>> by explicitly stating what PyOxidizer is doing.
>>
>> Essentially, to facilitate in-memory import, we need to register a
>> custom sys.meta_path importer *before* any file-based imports are
>> attempted. (...)
>> I /think/ providing a 2-phase
>> initialization that stops between _Py_InitializeCore() and
>> _Py_InitializeMainInterpreter() would get the job done for PyOxidizer
>> today. (...)
> 
> Extract of PEP 587: "This extracts a subset of the API design from the
> PEP 432 development and refactoring work that is now considered
> sufficiently stable to make public (allowing 3rd party embedding
> applications access to the same configuration APIs that the native
> CPython CLI is now using)."
> 
> We know that my PEP 587 is incomplete, but the work will continue in
> Python 3.9 to support your use case.
> 
> The PEP 587 introduces an experimental separation between "core" and
> "main" initialization phases. PyConfig._init_main=0 stops at the
> "core" phase, then you are free to run C and Python,
> _Py_InitializeMain() finishes the Python initialization ("main"
> phase).
> 
> 
>>> In the "Isolate Python" section, I suggest to set the "isolated"
>>> parameter to 1 which imply setting user_site_directory to 0. So
>>> sys.path isn't modified afterwards. What you pass to PyConfig is what
>>> you get in sys.path in this case.
>>
>> Regarding site.py, I agree it is problematic for embedding scenarios.
>> Some features of site.py can be useful. Others aren't. It would be
>> useful to have more granular control over which bits of site.run are
>> run. My naive suggestion would be to add individual flags to control
>> which functions site.py:main() runs. That way embedders can cherry-pick
>> site.py features without having to manually import the module and call
>> functions within. That feels much more robust for long-term maintainability.
> 
> I agree that more work can be done on the site module. IMHO core
> features which are needed by everybody should be done before calling
> site. Maybe using a frozen "presite" module or whatever. I would be
> interested to make possible to use Python for most cases without the
> site module.
> 
> 
>> Regarding Python calling exit(), this is problematic for embedding
>> scenarios.
> 
> I am working on that. I fixed dozens of functions. For example,
> Py_RunMain() should not longer exit if there is an uncaught SystemExit
> when calling PyErr_Print(). SystemExit is now handled separately
> before calling PyErr_Print(). The work is not done, but it should be
> way better than Python 3.6 and 3.7 state.
> 
> 
>> This thread called attention to exit() during interpreter
>> initialization. But it is also a problem elsewhere. For example,
>> PyErr_PrintEx() will call Py_Exit() if the exception is a SystemExit.
>> There's definitely room to improve the exception handling mechanism to
>> give embedders better control when SystemExit is raised. As it stands,
>> we need to check for SystemExit manually and reimplement
>> _Py_HandleSystemExit() to emulate its behavior for e.g. exception value
>> handling (fun fact: you can pass non-None, non-integer values to
>> sys.exit/SystemExit).
> 
> I don't know well these functions, maybe new functions are needed. It
> can be done without/outside the PEP 587.
> 
> 
>> Having a more efficient member lookup for BuiltinImporter and
>> FrozenImporter might shave off a millisecond or two from startup. This
>> would require some kind of derived data structure. (...)
> 
> I don't think that the structures/API to define frozen/builtin modules
> has to change. We can convert these lists into an hash table during
> the initialization of the importlib module.
> 
> I'm not saying that the current API is perfect, just that IMHO it can
> be solved without the API.
> 
> 
>> Unfortunately, as long as there is a global data structure that can be mutated any time
>> (the API contract doesn't prohibit modifying these global arrays after
>> initialization), you would need to check for "cache invalidation" on
>> every lookup, undermining performance benefits.
> 
> Do you really expect an application modifying these lists dynamically?

At this time, not really. But the mark of good API design IMO is that
its flexibility empowers new and novel ideas and ways of doing things.
Bad APIs (including the use of global variables) inhibit flexibility and
constrain creativity and advanced usage.

For this particular item, I could see some potential uses in processes
hosting multiple, independent interpreters. Maybe you want to give each
interpreter its own set of modules. That's not something you see today
because of the GIL and all the other global state in CPython. But with
the GIL's days apparently being numbered, who knows what the future holds.

I would highly encourage the official API surface to do away with
globals completely and for the internals to only use globals in ways
that impose minimal restrictions/caveats on usage/behavior. This PEP
along with others are huge steps in the right direction.