[Python-Dev] Update - Re: Make extension module initialisation more like Python module initialisation

Stefan Behnel stefan_ml at behnel.de
Thu Nov 8 15:32:34 CET 2012


Hi,

here's an updated proposal, adopting Marc-Andre's improvement that uses a
new field in the PyModuleDef struct to register the callback. Note that
this change no longer keeps up binary compatibility, which may or may not
be acceptable for Python 3.4.

Stefan


The problem
===========

Python modules and extension modules are not being set up in the same way.
For Python modules, the module is created and set up first, then the module
code is being executed. For extensions, i.e. shared libraries, the module
init function is executed straight away and does both the creation and
initialisation. This means that it knows neither the __file__ it is being
loaded from nor its package (i.e. its FQMN). This hinders relative imports
and resource loading. In Py3, it's also not being added to sys.modules,
which means that a (potentially transitive) re-import of the module will
really try to reimport it and thus run into an infinite loop when it
executes the module init function again. And without the FQMN, it's not
trivial to correctly add the module to sys.modules either.

We specifically run into this for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of a FQMN and correct
file path hinders the compilation of __init__.py modules, i.e. packages,
especially when relative imports are being used at module init time.

The proposal
============

I propose to split the extension module initialisation into two steps in
Python 3.4, in a backwards compatible way.

Step 1: The current module init function can be reduced to just creating
the module instance and returning it (and potentially doing some simple C
level setup). Additionally, and this is the new part, the module init code
can register a C callback function in its PyModuleDef struct that will be
called after setting up the module.

Step 2: The shared library importer receives the module instance from the
module init function, adds __file__, __path__, __package__ and friends to
the module dict, and then checks for the callback. If non-NULL, it calls it
to continue the module initialisation by user code.

The callback
============

The callback is defined as follows::

    int (*PyModule_init_callback)(PyObject* the_module,
                                  PyModuleInitContext* context)

"PyModuleInitContext" is a struct that is meant mostly for making the
callback more future proof by allowing additional parameters to be passed
in. For now, I can see a use case for the following fields::

    struct PyModuleInitContext {
        char* module_name;
        char* qualified_module_name;
    }

Both names are encoded in UTF-8. As for the file path, I consider it best
to retrieve it from the module's __file__ attribute as a Python string
object to reduce filename encoding problems.

Note that this struct argument is not strictly required (it could be a
simple "inquiry" function), but given that this proposal would have been
much simpler if the module init function had accepted such an argument in
the first place, I consider it a good idea not to let this chance pass by
again. The counter arguments would be "keep it simple" and "we already pass
in the whole module (and its dict) anyway". Up for debate!

The registration of the callback uses a new field "m_init" in the
PyModuleDef struct::

    typedef struct PyModuleDef{
      PyModuleDef_Base m_base;
      const char* m_name;
      const char* m_doc;
      Py_ssize_t m_size;
      PyMethodDef *m_methods;
      inquiry m_reload;
      traverseproc m_traverse;
      inquiry m_clear;
      freefunc m_free;          /* --- original fields up to here */
      PyModule_init_callback m_init;   /* post-setup init callback */
    } PyModuleDef;

Implementation
==============

The implementation requires local changes to the extension module importer
and a new field in the PyModuleDef struct.

Open questions
==============

It is not clear how extensions should be handled that register more than
one module in their module init function, e.g. compiled packages. One
possibility would be to leave the setup to the user, who would have to know
all FQMNs anyway in this case, although not the import file path.
Alternatively, the import machinery could use a stack to remember for which
modules a callback was registered during the last init function call, set
up all of them and then call their callbacks. It's not clear if this meets
the intention of the user. It's not guaranteed that all of these modules
will be related to the module that registered them, in the sense that they
should receive the same setup. The best way to fix this correctly might be
to make users pass the setup explicitly into the module creation functions
in Python 4 (see alternatives below), so that the setup and sys.modules
registration can happen directly at this point.

Alternatives
============

1) It would be possible to make extension modules optionally export another
symbol, e.g. "PyInit2_modulename", that the shared library loader would
call in addition to the required function "PyInit_modulename". This would
keep up binary compatibility. The drawback is that it also makes it easier
to write broken code because a Python version or implementation that does
not support this second symbol would simply not call it, without error. The
new struct field would let the build fail instead if it is not supported.

2) The callback could be made available as a Python function in the module
dict, thus also removing the need for an explicit registration API.
However, this approach would add overhead to both sides, the importer code
and the user provided module init code, as it would require additional
dictionary handling and the implementation of a one-time Python function in
user code. It would also suffer from the problem that missing support in
the runtime would pass silently.

3) The original proposal used a new C-API function to register the callback
explicitly, as opposed to extending the PyModuleDef struct. This has the
advantage of keeping up binary compatibility with existing Py3.3
extensions. It has the disadvantage of adding another indirection to the
setup procedure where a static function pointer would suffice.

4) Pass a new context argument into the module init function that contains
all information necessary to properly and completely set up the module at
creation time. This would provide a much simpler and cleaner solution than
the proposed solution. However, it will not be possible before Python 4 as
it breaks backwards compatibility with all existing extension modules at
both the source and binary level.



More information about the Python-Dev mailing list