[Python-Dev] Make extension module initialisation more like Python module initialisation

Stefan Behnel stefan_ml at behnel.de
Tue Aug 6 07:02:30 CEST 2013


Hi,

let me revive and summarize this old thread.

Stefan Behnel, 08.11.2012 13:47:
> I suspect that this will be put into a proper PEP at some point, but I'd
> like to bring this up for discussion first. This came out of issues 13429
> and 16392.
> 
> http://bugs.python.org/issue13429
> 
> http://bugs.python.org/issue16392
> 
> 
> The problem
> ===========
> 
> Python modules and extension modules are not being set up in the same way.
> For Python modules, the module is created and set up first, then the module
> code is being executed. For extensions, i.e. shared libraries, the module
> init function is executed straight away and does both the creation and
> initialisation. This means that it knows neither the __file__ it is being
> loaded from nor its package (i.e. its FQMN). This hinders relative imports
> and resource loading. In Py3, it's also not being added to sys.modules,
> which means that a (potentially transitive) re-import of the module will
> really try to reimport it and thus run into an infinite loop when it
> executes the module init function again. And without the FQMN, it's not
> trivial to correctly add the module to sys.modules either.
> 
> We specifically run into this for Cython generated modules, for which it's
> not uncommon that the module init code has the same level of complexity as
> that of any 'regular' Python module. Also, the lack of a FQMN and correct
> file path hinders the compilation of __init__.py modules, i.e. packages,
> especially when relative imports are being used at module init time.

The outcome of this discussion was that the extension module import
protocol needs to change in order to provide all necessary information to
the module init function.

Brett Cannon proposed to move the module object creation into the extension
module importer, i.e. outside of the user provided module init function.
CPython would then load the extension module, create and initialise the
module object (set __file__, __name__, etc.) and pass it into the module
init function.

I proposed to make the PyModuleDef struct the new entry point instead of
just a generic C function, as that would give the module importer all
necessary information about the module to create the module object. The
only missing bit is the entry point for the new module init function.

Nick Coghlan objected to the proposal of simply extending PyModuleDef with
an initialiser function, as the struct is part of the stable ABI.

Alternatives I see:

1) Expose a struct that points to the extension module's PyModuleDef struct
and the init function and expose that struct instead.

2) Expose both the PyModuleDef and the init function as public symbols.

3) Provide a public C function as entry point that returns both a
PyModuleDef pointer and a module init function pointer.

4) Change the m_init function pointer in PyModuleDef_base from func(void)
to func(PyObject*) iff the PyModuleDef struct is exposed as a public symbol.

5) Duplicate PyModuleDef and adapt the new one as in 4).

Alternatives 1) and 2) only differ marginally by the number of public
symbols being exposed. 3) has the advantage of supporting more advanced
setups, e.g. heap allocation for the PyModuleDef struct. 4) is a hack and
has the disadvantage that the signature of the module init function cannot
be stored across reinitialisations (PyModuleDef has no "flags" or "state"
field to remember it). 5) would fix that, i.e. we could add a proper
pointer to the new module init function as well as a flags field for future
extensions. A similar effect could be achieved by carefully designing the
struct in 1).

I think 1-3 are all reasonable ways to do this, although I don't think 3)
will be necessary. 5) would be a clean fix, but has the disadvantage of
duplicating an entire struct just to change one field in it.

I'm currently leaning towards 1), with a struct that points to PyModuleDef,
module init function and a flags field for future extensions. I understand
that this would need to become part of the stable ABI, so explicit
extensibility is important to keep up backwards compatibility.

Opinions?

Stefan




More information about the Python-Dev mailing list