[Import-SIG] PEP 489: Multi-phase extension module initialization; version 7

Petr Viktorin encukou at gmail.com
Thu May 21 13:27:16 CEST 2015

Based on the last round of comments, I've sent changes to PEP editors.

There is one functional change:
- Don't allow execution slots for non-module subclasses

and several wording fixes/clarifications:
- Remove misleading reason for PyModuleDef_Init
- Clarify that sys.modules is not checked between execution steps
- Add a Backwards Compatibility summary
- Heading level fix, typo fix

The full text follows:

PEP: 489
Title: Multi-phase extension module initialization
Version: $Revision$
Last-Modified: $Date$
Author: Petr Viktorin <encukou at gmail.com>,
        Stefan Behnel <stefan_ml at behnel.de>,
        Nick Coghlan <ncoghlan at gmail.com>
BDFL-Delegate: Eric Snow <ericsnowcurrently at gmail.com>
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2013
Python-Version: 3.5
Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015


This PEP proposes a redesign of the way in which built-in and extension modules
interact with the import machinery. This was last revised for Python 3.0 in PEP
3121, but did not solve all problems at the time. The goal is to solve
import-related problems by bringing extension modules closer to the way Python
modules behave; specifically to hook into the ModuleSpec-based loading
mechanism introduced in PEP 451.

This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension
authors to only define features they need, and to allow future additions
to extension module declarations.

Extensions modules are created in a two-step process, fitting better into
the ModuleSpec architecture, with parallels to __new__ and __init__ of classes.

Extension modules can safely store arbitrary C-level per-module state in
the module that is covered by normal garbage collection and supports
reloading and sub-interpreters.
Extension authors are encouraged to take these issues into account
when using the new API.

The proposal also allows extension modules with non-ASCII names.

Not all problems tackled in PEP 3121 are solved in this proposal.
In particular, problems with run-time module lookup (PyState_FindModule)
are left to a future PEP.


Python modules and extension modules are not being set up in the same way.
For Python modules, the module object is created and set up first, then the
module code is being executed (PEP 302).
A ModuleSpec object (PEP 451) is used to hold information about the module,
and passed to the relevant hooks.

For extensions (i.e. shared libraries) and built-in modules, the module
init function is executed straight away and does both the creation and
initialization. The initialization function is not passed the ModuleSpec,
or any information it contains, such as the __file__ or fully-qualified
name. This hinders relative imports and resource loading.

In Py3, modules are also not being added to sys.modules, which means that a
(potentially transitive) re-import of the module will really try to re-import
it and thus run into an infinite loop when it executes the module init function
again. Without access to the fully-qualified module name, it is not trivial to
correctly add the module to sys.modules either.
This is specifically a problem for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of __file__ and __name__
information hinders the compilation of "__init__.py" modules, i.e. packages,
especially when relative imports are being used at module init time.

Furthermore, the majority of currently existing extension modules has
problems with sub-interpreter support and/or interpreter reloading, and, while
it is possible with the current infrastructure to support these
features, it is neither easy nor efficient.
Addressing these issues was the goal of PEP 3121, but many extensions,
including some in the standard library, took the least-effort approach
to porting to Python 3, leaving these issues unresolved.
This PEP keeps backwards compatibility, which should reduce pressure and give
extension authors adequate time to consider these issues when porting.

The current process

Currently, extension and built-in modules export an initialization function
named "PyInit_modulename", named after the file name of the shared library.
This function is executed by the import machinery and must return a fully
initialized module object.
The function receives no arguments, so it has no way of knowing about its
import context.

During its execution, the module init function creates a module object
based on a PyModuleDef object. It then continues to initialize it by adding
attributes to the module dict, creating types, etc.

In the back, the shared library loader keeps a note of the fully qualified
module name of the last module that it loaded, and when a module gets
created that has a matching name, this global variable is used to determine
the fully qualified name of the module object. This is not entirely safe as it
relies on the module init function creating its own module object first,
but this assumption usually holds in practice.

The proposal

The initialization function (PyInit_modulename) will be allowed to return
a pointer to a PyModuleDef object. The import machinery will be in charge
of constructing the module object, calling hooks provided in the PyModuleDef
in the relevant phases of initialization (as described below).

This multi-phase initialization is an additional possibility. Single-phase
initialization, the current practice of returning a fully initialized module
object, will still be accepted, so existing code will work unchanged,
including binary compatibility.

The PyModuleDef structure will be changed to contain a list of slots,
similarly to PEP 384's PyType_Spec for types.
To keep binary compatibility, and avoid needing to introduce a new structure
(which would introduce additional supporting functions and per-module storage),
the currently unused m_reload pointer of PyModuleDef will be changed to
hold the slots. The structures are defined as::

    typedef struct {
        int slot;
        void *value;
    } PyModuleDef_Slot;

    typedef struct PyModuleDef {
        PyModuleDef_Base m_base;
        const char* m_name;
        const char* m_doc;
        Py_ssize_t m_size;
        PyMethodDef *m_methods;
        PyModuleDef_Slot *m_slots;  /* changed from `inquiry m_reload;` */
        traverseproc m_traverse;
        inquiry m_clear;
        freefunc m_free;
    } PyModuleDef;

The *m_slots* member must be either NULL, or point to an array of
PyModuleDef_Slot structures, terminated by a slot with id set to 0
(i.e. ``{0, NULL}``).

To specify a slot, a unique slot ID must be provided.
New Python versions may introduce new slot IDs, but slot IDs will never be
recycled. Slots may get deprecated, but will continue to be supported
throughout Python 3.x.

A slot's value pointer may not be NULL, unless specified otherwise in the
slot's documentation.

The following slots are currently available, and described later:

* Py_mod_create
* Py_mod_exec

Unknown slot IDs will cause the import to fail with SystemError.

When using multi-phase initialization, the *m_name* field of PyModuleDef will
not be used during importing; the module name will be taken from the ModuleSpec.

Before it is returned from PyInit_*, the PyModuleDef object must be initialized
using the newly added PyModuleDef_Init function. This sets the object type
(which cannot be done statically on certain compilers), refcount, and internal
bookkeeping data (m_index).
For example, an extension module "example" would be exported as::

    static PyModuleDef example_def = {...}

        return PyModuleDef_Init(&example_def);

The PyModuleDef object must be available for the lifetime of the module created
from it – usually, it will be declared statically.

Pseudo-code Overview

Here is an overview of how the modified importers will operate.
Details such as logging or handling of errors and invalid states
are left out, and C code is presented with a concise Python-like syntax.

The framework that calls the importers is explained in PEP 451



        class BuiltinImporter:
            def create_module(self, spec):
                module = _imp.create_builtin(spec)

            def exec_module(self, module):

            def load_module(self, name):
                # use a backwards compatibility shim
                _load_module_shim(self, name)


        class ExtensionFileLoader:
            def create_module(self, spec):
                module = _imp.create_dynamic(spec)

            def exec_module(self, module):

            def load_module(self, name):
                # use a backwards compatibility shim
                _load_module_shim(self, name)

    Python/import.c (the _imp module):

        def create_dynamic(spec):
            name = spec.name
            path = spec.origin

            # Find an already loaded module that used single-phase init.
            # For multi-phase initialization, mod is NULL, so a new module
            # is always created.
            mod = _PyImport_FindExtensionObject(name, name)
            if mod:
                return mod

            return _PyImport_LoadDynamicModuleWithSpec(spec)

        def exec_dynamic(module):
            if not isinstance(module, types.ModuleType):
                # non-modules are skipped -- PyModule_GetDef fails on them

            def = PyModule_GetDef(module)
            state = PyModule_GetState(module)
            if state is NULL:
                PyModule_ExecDef(module, def)

        def create_builtin(spec):
            name = spec.name

            # Find an already loaded module that used single-phase init.
            # For multi-phase initialization, mod is NULL, so a new module
            # is always created.
            mod = _PyImport_FindExtensionObject(name, name)
            if mod:
                return mod

            for initname, initfunc in PyImport_Inittab:
                if name == initname:
                    m = initfunc()
                    if isinstance(m, PyModuleDef):
                        def = m
                        return PyModule_FromDefAndSpec(def, spec)
                        # fall back to single-phase initialization
                        module = m
                        _PyImport_FixupExtensionObject(module, name, name)
                        return module


        def _PyImport_LoadDynamicModuleWithSpec(spec):
            path = spec.origin
            package, dot, name = spec.name.rpartition('.')

            # see the "Non-ASCII module names" section for export_hook_name
            hook_name = export_hook_name(name)

            # call platform-specific function for loading exported function
            # from shared library
            exportfunc = _find_shared_funcptr(hook_name, path)

            m = exportfunc()
            if isinstance(m, PyModuleDef):
                def = m
                return PyModule_FromDefAndSpec(def, spec)

            module = m

            # fall back to single-phase initialization


        def PyModule_FromDefAndSpec(def, spec):
            name = spec.name
            create = None
            for slot, value in def.m_slots:
                if slot == Py_mod_create:
                    create = value
            if create:
                m = create(spec, def)
                m = PyModule_New(name)

            if isinstance(m, types.ModuleType):
                m.md_state = None
                m.md_def = def

            if def.m_methods:
                PyModule_AddFunctions(m, def.m_methods)
            if def.m_doc:
                PyModule_SetDocString(m, def.m_doc)

        def PyModule_ExecDef(module, def):
            if isinstance(module, types.module_type):
                if module.md_state is NULL:
                    # allocate a block of zeroed-out memory
                    module.md_state = _alloc(module.md_size)

            if def.m_slots is NULL:

            for slot, value in def.m_slots:
                if slot == Py_mod_exec:

Module Creation Phase

Creation of the module object – that is, the implementation of
ExecutionLoader.create_module – is governed by the Py_mod_create slot.

The Py_mod_create slot

The Py_mod_create slot is used to support custom module subclasses.
The value pointer must point to a function with the following signature::

    PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def)

The function receives a ModuleSpec instance, as defined in PEP 451,
and the PyModuleDef structure.
It should return a new module object, or set an error
and return NULL.

This function is not responsible for setting import-related attributes
specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or
``__loader__``) on the new module.

There is no requirement for the returned object to be an instance of
types.ModuleType. Any type can be used, as long as it supports setting and
getting attributes, including at least the import-related attributes.
However, only ModuleType instances support module-specific functionality
such as per-module state and processing of execution slots.
If something other than a ModuleType subclass is returned, no execution slots
may be defined; if any are, a SystemError is raised.

Note that when this function is called, the module's entry in sys.modules
is not populated yet. Attempting to import the same module again
(possibly transitively), may lead to an infinite loop.
Extension authors are advised to keep Py_mod_create minimal, an in particular
to not call user code from it.

Multiple Py_mod_create slots may not be specified. If they are, import
will fail with SystemError.

If Py_mod_create is not specified, the import machinery will create a normal
module object using PyModule_New. The name is taken from *spec*.

Post-creation steps

If the Py_mod_create function returns an instance of types.ModuleType
or a subclass (or if a Py_mod_create slot is not present), the import
machinery will associate the PyModuleDef with the module.
This also makes the PyModuleDef accessible to execution phase, the
PyModule_GetDef function, and garbage collection routines (traverse,
clear, free).

If the Py_mod_create function does not return a module subclass, then m_size
must be 0, and m_traverse, m_clear and m_free must all be NULL.
Otherwise, SystemError is raised.

Additionally, initial attributes specified in the PyModuleDef are set on the
module object, regardless of its type:

* The docstring is set from m_doc, if non-NULL.
* The module's functions are initialized from m_methods, if any.

Module Execution Phase

Module execution -- that is, the implementation of
ExecutionLoader.exec_module -- is governed by "execution slots".
This PEP only adds one, Py_mod_exec, but others may be added in the future.

The execution phase is done on the PyModuleDef associated with the module
object. For objects that are not a subclass of PyModule_Type (for which
PyModule_GetDef would fail), the execution phase is skipped.

Execution slots may be specified multiple times, and are processed in the order
they appear in the slots array.
When using the default import machinery, they are processed after
import-related attributes specified in PEP 451 [#pep-0451-attributes]_
(such as ``__name__`` or ``__loader__``) are set and the module is added
to sys.modules.

Pre-Execution steps

Before processing the execution slots, per-module state is allocated for the
module. From this point on, per-module state is accessible through

The Py_mod_exec slot

The entry in this slot must point to a function with the following signature::

    int (*PyModuleExecFunction)(PyObject* module)

It will be called to initialize a module. Usually, this amounts to
setting the module's initial attributes.
The "module" argument receives the module object to initialize.

The function must return ``0`` on success, or, on error, set an exception and
return ``-1``.

If PyModuleExec replaces the module's entry in sys.modules, the new object
will be used and returned by importlib machinery after all execution slots
are processed. This is a feature of the import machinery itself.
The slots themselves are all processed using the module returned from the
creation phase; sys.modules is not consulted during the execution phase.
(Note that for extension modules, implementing Py_mod_create is usually
a better solution for using custom module objects.)

Legacy Init

The backwards-compatible single-phase initialization continues to be supported.
In this scheme, the PyInit function returns a fully initialized module rather
than a PyModuleDef object.
In this case, the PyInit hook implements the creation phase, and the execution
phase is a no-op.

Modules that need to work unchanged on older versions of Python should stick to
single-phase initialization, because the benefits it brings can't be
Here is an example of a module that supports multi-phase initialization,
and falls back to single-phase when compiled for an older version of CPython.
It is included mainly as an illustration of the changes needed to enable
multi-phase init::

    #include <Python.h>

    static int spam_exec(PyObject *module) {
        PyModule_AddStringConstant(module, "food", "spam");
        return 0;

    #ifdef Py_mod_exec
    static PyModuleDef_Slot spam_slots[] = {
        {Py_mod_exec, spam_exec},
        {0, NULL}

    static PyModuleDef spam_def = {
        PyModuleDef_HEAD_INIT,                      /* m_base */
        "spam",                                     /* m_name */
        PyDoc_STR("Utilities for cooking spam"),    /* m_doc */
        0,                                          /* m_size */
        NULL,                                       /* m_methods */
    #ifdef Py_mod_exec
        spam_slots,                                 /* m_slots */
        NULL,                                       /* m_traverse */
        NULL,                                       /* m_clear */
        NULL,                                       /* m_free */

    PyInit_spam(void) {
    #ifdef Py_mod_exec
        return PyModuleDef_Init(&spam_def);
        PyObject *module;
        module = PyModule_Create(&spam_def);
        if (module == NULL) return NULL;
        if (spam_exec(module) != 0) {
            return NULL;
        return module;

Built-In modules

Any extension module can be used as a built-in module by linking it into
the executable, and including it in the inittab (either at runtime with
PyImport_AppendInittab, or at configuration time, using tools like *freeze*).

To keep this possibility, all changes to extension module loading introduced
in this PEP will also apply to built-in modules.
The only exception is non-ASCII module names, explained below.

Subinterpreters and Interpreter Reloading

Extensions using the new initialization scheme are expected to support
subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly,
avoiding the issues mentioned in Python documentation [#subinterpreter-docs]_.
The mechanism is designed to make this easy, but care is still required
on the part of the extension author.
No user-defined functions, methods, or instances may leak to different
To achieve this, all module-level state should be kept in either the module
dict, or in the module object's storage reachable by PyModule_GetState.
A simple rule of thumb is: Do not define any static data, except built-in types
with no mutable or user-settable class attributes.

Functions incompatible with multi-phase initialization

The PyModule_Create function will fail when used on a PyModuleDef structure
with a non-NULL *m_slots* pointer.
The function doesn't have access to the ModuleSpec object necessary for
multi-phase initialization.

The PyState_FindModule function will return NULL, and PyState_AddModule
and PyState_RemoveModule will also fail on modules with non-NULL *m_slots*.
PyState registration is disabled because multiple module objects may be created
from the same PyModuleDef.

Module state and C-level callbacks

Due to the unavailability of PyState_FindModule, any function that needs access
to module-level state (including functions, classes or exceptions defined at
the module level) must receive a reference to the module object (or the
particular object it needs), either directly or indirectly.
This is currently difficult in two situations:

* Methods of classes, which receive a reference to the class, but not to
  the class's module
* Libraries with C-level callbacks, unless the callbacks can receive custom
  data set at callback registration

Fixing these cases is outside of the scope of this PEP, but will be needed for
the new mechanism to be useful to all modules. Proper fixes have been discussed
on the import-sig mailing list [#findmodule-discussion]_.

As a rule of thumb, modules that rely on PyState_FindModule are, at the moment,
not good candidates for porting to the new mechanism.

New Functions

A new function and macro implementing the module creation phase will be added.
These are similar to PyModule_Create and PyModule_Create2, except they
take an additional ModuleSpec argument, and handle module definitions with
non-NULL slots::

    PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec)
    PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec,
                                        int module_api_version)

A new function implementing the module execution phase will be added.
This allocates per-module state (if not allocated already), and *always*
processes execution slots. The import machinery calls this method when
a module is executed, unless the module is being reloaded::

    PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def)

Another function will be introduced to initialize a PyModuleDef object.
This idempotent function fills in the type, refcount, and module index.
It returns its argument cast to PyObject*, so it can be returned directly
from a PyInit function::

    PyObject * PyModuleDef_Init(PyModuleDef *);

Additionally, two helpers will be added for setting the docstring and
methods on a module::

    int PyModule_SetDocString(PyObject *, const char *)
    int PyModule_AddFunctions(PyObject *, PyMethodDef *)

Export Hook Name

As portable C identifiers are limited to ASCII, module names
must be encoded to form the PyInit hook name.

For ASCII module names, the import hook is named
PyInit_<modulename>, where <modulename> is the name of the module.

For module names containing non-ASCII characters, the import hook is named
PyInitU_<encodedname>, where the name is encoded using CPython's
"punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix),
with hyphens ("-") replaced by underscores ("_").

In Python::

    def export_hook_name(name):
            suffix = b'_' + name.encode('ascii')
        except UnicodeEncodeError:
            suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
        return b'PyInit' + suffix


=============  ===================
Module name    Init hook name
=============  ===================
spam           PyInit_spam
lančmít        PyInitU_lanmt_2sa6t
スパム          PyInitU_zck5b2b
=============  ===================

For modules with non-ASCII names, single-phase initialization is not supported.

In the initial implementation of this PEP, built-in modules with non-ASCII
names will not be supported.

Module Reloading

Reloading an extension module using importlib.reload() will continue to
have no effect, except re-setting import-related attributes.

Due to limitations in shared library loading (both dlopen on POSIX and
LoadModuleEx on Windows), it is not generally possible to load
a modified library after it has changed on disk.

Use cases for reloading other than trying out a new version of the module
are too rare to require all module authors to keep reloading in mind.
If reload-like functionality is needed, authors can export a dedicated
function for it.

Multiple modules in one library

To support multiple Python modules in one shared library, the library can
export additional PyInit* symbols besides the one that corresponds
to the library's filename.

Note that this mechanism can currently only be used to *load* extra modules,
but not to *find* them. (This is a limitation of the loader mechanism,
which this PEP does not try to modify.)
To work around the lack of a suitable finder, code like the following
can be used::

    import importlib.machinery
    import importlib.util
    loader = importlib.machinery.ExtensionFileLoader(name, path)
    spec = importlib.util.spec_from_loader(name, loader)
    module = importlib.util.module_from_spec(spec)
    return module

On platforms that support symbolic links, these may be used to install one
library under multiple names, exposing all exported modules to normal
import machinery.

Testing and initial implementations

For testing, a new built-in module ``_testmultiphase`` will be created.
The library will export several additional modules using the mechanism
described in "Multiple modules in one library".

The ``_testcapi`` module will be unchanged, and will use single-phase
initialization indefinitely (or until it is no longer supported).

The ``array`` and ``xx*`` modules will be converted to use multi-phase
initialization as part of the initial implementation.

Summary of API Changes and Additions

New functions:

* PyModule_FromDefAndSpec (macro)
* PyModule_FromDefAndSpec2
* PyModule_ExecDef
* PyModule_SetDocString
* PyModule_AddFunctions
* PyModuleDef_Init

New macros:

* Py_mod_create
* Py_mod_exec

New types:

* PyModuleDef_Type will be exposed

New structures:

* PyModuleDef_Slot

PyModuleDef.m_reload changes to PyModuleDef.m_slots.

The internal ``_imp`` module will have backwards incompatible changes:
``create_builtin``, ``create_dynamic``, and ``exec_dynamic`` will be added;
``init_builtin``, ``load_dynamic`` will be removed.

The undocumented functions ``imp.load_dynamic`` and ``imp.init_builtin`` will
be replaced by backwards-compatible shims.

Backwards Compatibility

Existing modules will continue to be source- and binary-compatible with new
versions of Python.
Modules that use multi-phase initialization will not be compatible with
versions of Python that do not implement this PEP.

The functions ``init_builtin`` and ``load_dynamic`` will be removed from
the ``_imp`` module (but not from the ``imp`` module).

All changed loaders (``BuiltinImporter`` and ``ExtensionFileLoader``) will
remain backwards-compatible; the ``load_module`` method will be replaced by
a shim.

Internal functions of Python/import.c and Python/importdl.c will be removed.
(Specifically, these are ``_PyImport_GetDynLoadFunc``,
``_PyImport_GetDynLoadWindows``, and ``_PyImport_LoadDynamicModule``.)

Possible Future Extensions

The slots mechanism, inspired by PyType_Slot from PEP 384,
allows later extensions.

Some extension modules exports many constants; for example _ssl has
a long list of calls in the form::

    PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",

Converting this to a declarative list, similar to PyMethodDef,
would reduce boilerplate, and provide free error-checking which
is often missing.

String constants and types can be handled similarly.
(Note that non-default bases for types cannot be portably specified
statically; this case would need a Py_mod_exec function that runs
before the slots are added. The free error-checking would still be
beneficial, though.)

Another possibility is providing a "main" function that would be run
when the module is given to Python's -m switch.
For this to work, the runpy module will need to be modified to take
advantage of ModuleSpec-based loading introduced in PEP 451.
Also, it will be necessary to add a mechanism for setting up a module
according to slots it wasn't originally defined with.


Work-in-progress implementation is available in a Github repository [#gh-repo]_;
a patchset is at [#gh-patch]_.

Previous Approaches

Stefan Behnel's initial proto-PEP [#stefans_protopep]_
had a "PyInit_modulename" hook that would create a module class,
whose ``__init__`` would be then called to create the module.
This proposal did not correspond to the (then nonexistent) PEP 451,
where module creation and initialization is broken into distinct steps.
It also did not support loading an extension into pre-existing module objects.

Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype
implementation [#nicks-prototype]_.
At this time PEP 451 was still not implemented, so the prototype
does not use ModuleSpec.

The original version of this PEP used Create and Exec hooks, and allowed
loading into arbitrary pre-constructed objects with Exec hook.
The proposal made extension module initialization closer to how Python modules
are initialized, but it was later recognized that this isn't an important goal.
The current PEP describes a simpler solution.

A further iteration used a "PyModuleExport" hook as an alternative to PyInit,
where PyInit was used for existing scheme, and PyModuleExport for multi-phase.
However, not being able to determine the hook name based on module name
complicated automatic generation of PyImport_Inittab by tools like freeze.
Keeping only the PyInit hook name, even if it's not entirely appropriate for
exporting a definition, yielded a much simpler solution.


.. [#pep-0451-attributes]

.. [#stefans_protopep]

.. [#nicks-prototype]

.. [#rfc-3492]

.. [#gh-repo]

.. [#gh-patch]

.. [#findmodule-discussion]

.. [#pep-0451-loading]

.. [#subinterpreter-docs]


This document has been placed in the public domain.

More information about the Import-SIG mailing list