[Import-SIG] PEP 489: Redesigning extension module loading

Petr Viktorin encukou at gmail.com
Sat Mar 21 19:37:20 CET 2015

On 03/21/2015 06:38 PM, Stefan Behnel wrote:
> Petr Viktorin schrieb am 21.03.2015 um 11:30:
>> It would be nice to extend runpy to handle Create+Exec modules. If this can
>> be pulled off, there'd be no need for Exec-only modules except the
>> convenience.
>> * module reloading is useless for extension modules – a changed version
>> version can't be read from the disk, and correct reload behavior is another
>> corner case for authors to think about
> I think even shared library reloading could be achieved by using a filename
> scheme like "modulename-HASH.so" with a SHA hash of the source file or so,
> if the original module name is used to run the right module init function(s).
> The files would pile up in memory, though (there's usually no "dynamic
> unlinking"), so it's not a feature for production. I generally agree that
> there is little enough of a use case for reloading that it can safely be
> ignored.

I think this is something to build on top of what Python will provide. 
The "modulename-HASH.so" file wouldn't be easily locatable, so you'd 
need a "modulename.py" or "modulename.so" in front of it anyway, and 
that could just proxy to the real module (which stays non-reloadable). 
Implementation is up to any iterested party :)

>> One thing I'm not clear about: what are the advantages of a module subclass
>> over a normal module with m_size>0?
> Properties and methods. In fact, you should rather ask why module objects
> have to be special in the first place.

Well, methods are already part of PyModuleDef, so that leaves properties.
Module objects are special mainly because they need space for C state, 
otherwise any object could be used (as in the current PEP).

> My initial idea was to implement *only* an extension type in extension
> modules, and have the library loader instantiate that. It would simply pass
> the module spec as constructor argument. However, Nick convinced me at the
> time that that's a) too inflexible and b) too cumbersome for manually
> written code. That eventually brought up the idea of splitting the
> initialisation into Create+Exec.

And after that, the current PEP is meant to discourage using Create as 
much as possible. But I see how it's useful to provide it.

>> Separating Create and Exec has these effects:
>> - Allowing you to implement just one and leave the rest to default
>> machinery. This is good.
>> - Allowing some time to pass between Create and Exec is called. This might
>> be useful for lazy loading, I guess.
>> - Allowing the loader or third-party code to modify the object between
>> Create and Exec is called. This is dangerous (for consenting adults who
>> don't mind the occasional segfault).
> Depends on what they do with the object. Setting attributes on it should be
> ok, for example. In fact, I would like to leave it to CPython to set
> attributes like "__name__" and "__file__" on it, because that simplifies
> the implementation of a Create function. From time to time, the module
> interface is extended with new attributes, so setting them externally
> avoids the need to adapt the user code each time.

I agree here, and if your module subclass doesn't support setting dunder 
attributes then you need a custom loader for it.

> However, an API helper function could be provided that copies attributes
> from the module spec to the 'module' object. Calling that is simple enough,
> and it would leave the responsibility for the evolution of the "standard
> module API" in CPython.

The import machinery does that between create and exec; I don't think an 
extra helper is necessary.

>> - Allowing Exec to be called multiple times after Create, i.e. module
>> reloading. I don't think there is a use case (and for module-specific cases
>> it can be done in a separately exported function).
>> - Allowing Exec without the corresponding Create, i.e. loading into
>> arbitrary objects. This is cool, and it mimics what source modules can do,
>> but I'm less and less convinced that it's actually useful.
>> It's a lot to think about if you want to design a module that behaves
>> correctly, and for some combinations it's not clear what "correctly" means.
> I agree. I think we can leave out these two "features".
>>> The API design for defining types through the stable ABI
>>> (https://www.python.org/dev/peps/pep-0384/#type-objects), which was
>>> designed with the benefit of years of experience with the old
>>> approach, is much nicer, as the NULL-terminated list of named slots
>>> lets you only worry about the slots you care about, and the
>>> interpreter takes care of everything else.
>> Well, if we end up needing to extend PyModuleDef, let's use slots.
> That means we have to enable support for that now. And we have to integrate
> it with the way to provide the PyModuleDef in the first place (note that
> extending PyModuleDef itself is not an option due to the stable ABI).
> Meaning, users who don't want to provide a Create function will still have
> to deal with the (empty) slots, and everyone else will currently have to
> provide a one-slot "create" entry.
> I'm not saying it's a bad idea, but it might not be a good one either.

I meant slots as in PEP 0384 PyType_Slot – there'd be no empty slots to 
deal with, you'd just set the ones to use.

It does mean deprecating PyModuleDef, though.

>>> That two level approach gives you all the same flexibility you have
>>> today by defining a custom Init hook (and more), but also lets you opt
>>> out of learning most of the details of the C data model if all you're
>>> really after is faster low level manipulation of data stored in Python
>>> objects.
>> A module def array additionally gives:
>> - support for non-ASCII module names
>> - a catalog of the modules the extension contains
>> but you can't use custom module subclasses -- unless a create slot is added
>> to the module def. (Or you can replace the sys.modules entry -- I believe
>> the overhead of a wasted empty module object is negligible.)
> Yes, I guess it would be. However, the replacement must happen before other
> code might access the module (e.g. by importing it), i.e. right after
> putting it into sys.modules, at the very start of the Exec step.
> It does seem feel a hack, though, to design an interface that says "here's
> your module, throw it away if you like, but make sure to clean up what I
> left behind"...

Yes, it is a hack (and to be honest I think supporting properties on 
modules should feel hacky). Though a Create slot on module def would 
avoid the need for such a hack.

More information about the Import-SIG mailing list