Re: [Python-Dev] redesigning the extension module initialisation protocol (was: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others))

11 Aug 2013


      On Sun, Aug 11, 2013 at 6:52 AM, Stefan Behnel  wrote:
...
Nick Coghlan, 11.08.2013 15:19:
...
On 11 Aug 2013 09:02, "Stefan Behnel" wrote:
...
...
BTW, this already suggests a simple module initialisation interface.
The
extension module would expose a function that returns a module type,
and
the loader/importer would then simply instantiate that. Nothing else is
needed.
Actually, strike the word "module type" and replace it with "type". Is
there really a reason why Python needs a module type at all? I mean, you
can stick arbitrary objects in sys.modules, so why not allow arbitrary
types to be returned by the module creation function?
That's exactly what I have in mind, but the way extension module imports
currently work means we can't easily do it just yet. Fortunately,
importlib
means we now have some hope of fixing that :)
Well, what do we need? We don't need to care about existing code, as long
as the current scheme is only deprecated and not deleted. That won't happen
before Py4 anyway. New code would simply export a different symbol when
compiling for a CPython that supports it, which points to the function that
returns the type.
Then, there's already the PyType_Copy() function, which can be used to
create a heap type from a statically defined type. So extension modules can
simply define an (arbitrary) additional type in any way they see fit, copy
it to the heap, and return it.
Next, we need to define a signature for the type's __init__() method. This
can be done in a future proof way by allowing arbitrary keyword arguments
to be added, i.e. such a type must have a signature like
def __init__(self, currently, used, pos, args, **kwargs)
and simply ignore kwargs for now.
Actually, we may get away with not passing all too many arguments here if
we allow the importer to add stuff to the type's dict in between,
specifically __file__, __path__ and friends, so that they are available
before the type gets instantiated. Not sure if this is a good idea, but it
would at least relieve the user from having to copy these things over from
some kind of context or whatever we might want to pass in.
Alternatively, we could split the instantiation up between tp_new() and
tp_init(), and let the importer set stuff on the instance dict in between
the two. But given that this context won't actually change once the shared
library is loaded, the only reason to prefer modifying the instance instead
of the type would be to avoid requiring a tp_dict for the type. Open for
discussion, I guess.
Did I forget anything? Sounds simple enough to me so far.
Out of curiosity - can we list actual use cases for this new design? The
previous thread, admittedly, deals with an isoteric corner-cases that comes
up in overly-clever tests. If we plan to serious consider these changes -
and this appears to be worth a PEP - we need a list of actual advantages
over the current approach. It's not that a more conceptually pure design is
an insufficient reason, IMHO, but it would be interesting to hear about
other implications.

Eli

Re: [Python-Dev] redesigning the extension module initialisation protocol (was: Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others))

Eli Bendersky