[Python-Dev] New Python Initialization API

Victor Stinner vstinner at redhat.com
Wed Mar 27 13:48:59 EDT 2019


Hi,

I would like to add a new C API to initialize Python. I would like
your opinion on the whole API before making it public. The code is
already implemented. Doc of the new API:

   https://pythondev.readthedocs.io/init_config.html


To make the API public, _PyWstrList, _PyInitError, _PyPreConfig,
_PyCoreConfig and related functions should be made public.

By the way, I would suggest to rename "_PyCoreConfig" to just
"PyConfig" :-) I don't think that "core init" vs "main init" is really
relevant: more about that below.


Let's start with two examples using the new API.

Example of simple initialization to enable isolated mode:

    _PyCoreConfig config = _PyCoreConfig_INIT;
    config.isolated = 1;

    _PyInitError err = _Py_InitializeFromConfig(&config);
    if (_Py_INIT_FAILED(err)) {
        _Py_ExitInitError(err);
    }
    /* ... use Python API here ... */
    Py_Finalize();

Example using the pre-initialization to enable the UTF-8 Mode (and use the
"legacy" Py_Initialize() function):

    _PyPreConfig preconfig = _PyPreConfig_INIT;
    preconfig.utf8_mode = 1;

    _PyInitError err = _Py_PreInitialize(&preconfig);
    if (_Py_INIT_FAILED(err)) {
        _Py_ExitInitError(err);
    }

    /* at this point, Python will only speak UTF-8 */

    Py_Initialize();
    /* ... use Python API here ... */
    Py_Finalize();

Since November 2017, I'm refactoring the Python Initialization code to
cleanup the code and prepare a new ("better") API to configure Python
Initialization. I just fixed the last issues that Nick Coghlan asked
me to fix (add a pre-initialization step: done, fix mojibake: done).
My work is inspired by Nick Coghlan's PEP 432, but it is not
implementing it directly. I had other motivations than Nick even if we
are somehow going towards the same direction.

Nick wants to get a half-initialized Python ("core init"), configure
Python using the Python API and Python objects, and then finish the
implementation ("main init").

I chose a different approach: put *everything* into a single C
structure (_PyCoreConfig) using C types. Using the structure, you
should be able to do what Nick wanted to do, but with C rather than
Python. Nick: please tell me if I'm wrong :-)

This work is also connected to Eric Snow's work on sub-interpreters
(PEP 554) and moving global variables into structures. For example,
I'm using his _PyRuntime structure to store a new "preconfig" state
(pre-initialization configuration, more about that below).

In November 2017, when I started to work on the Python Initialization
(bpo-32030), I identified the following problems:

* Many parts of the code were interdependent
* Code executed early in Py_Main() used the Python API before the Python API
  was fully initialized. Like code parsing -W command line option which
  used PyUnicode_FromWideChar() and PyList_Append().
* Error handling used Py_FatalError() which didn't let the caller to decide
  how to handle the error. Moreover, exit() was used to exit Python,
whereas libpython shouldn't do that: a library should not exit the
whole process! (imagine when Python is embedded inside an application)

One year and a half later, I implemented the following solutions:

* Py_Main() and Py_Initialize() code has been reorganized to respect
  priorities between global configuration variables (ex:
  Py_IgnoreEnvironmentFlag), environment variables (ex: PYTHONPATH), command
  line arguments (ex: -X utf8), configuration files (ex: pyenv.cfg), and the
  new _PyPreConfig and _PyCoreConfig structures which store the whole
  configuration.
* Python Initialization no longer uses the Python API but only C types
  like wchar_t* strings, a new _PyWstrList structure and PyMem_RawMalloc()
  memory allocator (PyMem_Malloc() is no longer used during init).
* The code has been modified to use a new _PyInitError structure. The caller
  of the top function gets control to cleanup everything before handling the
  error (display a fatal error message or simply exit Python).

The new _PyCoreConfig structure has the top-priority and provides a single
structure for all configuration parameters.

It becomes possible to override the code computing the "path configuration"
like sys.path to fully control where Python looks to import modules. It
becomes possible to use an empty list of paths to only allow builtin modules.

A new "pre-initialization" steps is responsible to configure the bare minimum
before the Python initialization: memory allocators and encodings
(LC_CTYPE locale
and the UTF-8 mode). The LC_CTYPE is no longer coerced and the UTF-8 Mode is
no longer enabled automatically depending on the user configuration to prevent
mojibake. Previously, calling Py_DecodeLocale() to get a Unicode wchar_t*
string from a bytes wchar* string created mojibake when called before
Py_Initialize() if the LC_CTYPE locale was coerced and/or if the UTF-8 Mode was
enabled.

The pre-initialization step ensures that the encodings and memory allocators
are well defined *before* Py_Initialize() is called.

Since the new API is currently private, I didn't document it in
Python. Moreover, the code changed a lot last year :-) But it should
now be way more stable. I started to document it in a separated
webpage:

   https://pythondev.readthedocs.io/init_config.html

The plan is to put it in the Python documentation once it becomes public.

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.


More information about the Python-Dev mailing list