[Python-Dev] redesigning the extension module initialisation protocol (was: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others))

Stefan Behnel stefan_ml at behnel.de
Sun Aug 11 15:52:52 CEST 2013

Nick Coghlan, 11.08.2013 15:19:
> On 11 Aug 2013 09:02, "Stefan Behnel" wrote:
>>> BTW, this already suggests a simple module initialisation interface. The
>>> extension module would expose a function that returns a module type, and
>>> the loader/importer would then simply instantiate that. Nothing else is
>>> needed.
>> Actually, strike the word "module type" and replace it with "type". Is
>> there really a reason why Python needs a module type at all? I mean, you
>> can stick arbitrary objects in sys.modules, so why not allow arbitrary
>> types to be returned by the module creation function?
> That's exactly what I have in mind, but the way extension module imports
> currently work means we can't easily do it just yet. Fortunately, importlib
> means we now have some hope of fixing that :)

Well, what do we need? We don't need to care about existing code, as long
as the current scheme is only deprecated and not deleted. That won't happen
before Py4 anyway. New code would simply export a different symbol when
compiling for a CPython that supports it, which points to the function that
returns the type.

Then, there's already the PyType_Copy() function, which can be used to
create a heap type from a statically defined type. So extension modules can
simply define an (arbitrary) additional type in any way they see fit, copy
it to the heap, and return it.

Next, we need to define a signature for the type's __init__() method. This
can be done in a future proof way by allowing arbitrary keyword arguments
to be added, i.e. such a type must have a signature like

    def __init__(self, currently, used, pos, args, **kwargs)

and simply ignore kwargs for now.

Actually, we may get away with not passing all too many arguments here if
we allow the importer to add stuff to the type's dict in between,
specifically __file__, __path__ and friends, so that they are available
before the type gets instantiated. Not sure if this is a good idea, but it
would at least relieve the user from having to copy these things over from
some kind of context or whatever we might want to pass in.

Alternatively, we could split the instantiation up between tp_new() and
tp_init(), and let the importer set stuff on the instance dict in between
the two. But given that this context won't actually change once the shared
library is loaded, the only reason to prefer modifying the instance instead
of the type would be to avoid requiring a tp_dict for the type. Open for
discussion, I guess.

Did I forget anything? Sounds simple enough to me so far.


More information about the Python-Dev mailing list