[Import-SIG] PEP 489: Redesigning extension module loading

Stefan Behnel stefan_ml at behnel.de
Sat Mar 21 18:38:40 CET 2015


Petr Viktorin schrieb am 21.03.2015 um 11:30:
> It would be nice to extend runpy to handle Create+Exec modules. If this can
> be pulled off, there'd be no need for Exec-only modules except the
> convenience.
> 
> * module reloading is useless for extension modules – a changed version
> version can't be read from the disk, and correct reload behavior is another
> corner case for authors to think about

I think even shared library reloading could be achieved by using a filename
scheme like "modulename-HASH.so" with a SHA hash of the source file or so,
if the original module name is used to run the right module init function(s).

The files would pile up in memory, though (there's usually no "dynamic
unlinking"), so it's not a feature for production. I generally agree that
there is little enough of a use case for reloading that it can safely be
ignored.


> One thing I'm not clear about: what are the advantages of a module subclass
> over a normal module with m_size>0?

Properties and methods. In fact, you should rather ask why module objects
have to be special in the first place.

My initial idea was to implement *only* an extension type in extension
modules, and have the library loader instantiate that. It would simply pass
the module spec as constructor argument. However, Nick convinced me at the
time that that's a) too inflexible and b) too cumbersome for manually
written code. That eventually brought up the idea of splitting the
initialisation into Create+Exec.


>> I thought Brett actually implemented multi-module extension support a
>> while back (which this PEP would then inherit), but I can't find any
>> current evidence of that change, so either my recollection is wrong,
>> or my search skills are failing me :)
> 
> It's there, grep issue16421.

Thanks. I didn't know about it.


> Separating Create and Exec has these effects:
> - Allowing you to implement just one and leave the rest to default
> machinery. This is good.
> - Allowing some time to pass between Create and Exec is called. This might
> be useful for lazy loading, I guess.
> - Allowing the loader or third-party code to modify the object between
> Create and Exec is called. This is dangerous (for consenting adults who
> don't mind the occasional segfault).

Depends on what they do with the object. Setting attributes on it should be
ok, for example. In fact, I would like to leave it to CPython to set
attributes like "__name__" and "__file__" on it, because that simplifies
the implementation of a Create function. From time to time, the module
interface is extended with new attributes, so setting them externally
avoids the need to adapt the user code each time.

However, an API helper function could be provided that copies attributes
from the module spec to the 'module' object. Calling that is simple enough,
and it would leave the responsibility for the evolution of the "standard
module API" in CPython.


> - Allowing Exec to be called multiple times after Create, i.e. module
> reloading. I don't think there is a use case (and for module-specific cases
> it can be done in a separately exported function).
> - Allowing Exec without the corresponding Create, i.e. loading into
> arbitrary objects. This is cool, and it mimics what source modules can do,
> but I'm less and less convinced that it's actually useful.
> 
> It's a lot to think about if you want to design a module that behaves
> correctly, and for some combinations it's not clear what "correctly" means.

I agree. I think we can leave out these two "features".


>> The API design for defining types through the stable ABI
>> (https://www.python.org/dev/peps/pep-0384/#type-objects), which was
>> designed with the benefit of years of experience with the old
>> approach, is much nicer, as the NULL-terminated list of named slots
>> lets you only worry about the slots you care about, and the
>> interpreter takes care of everything else.
> 
> Well, if we end up needing to extend PyModuleDef, let's use slots.

That means we have to enable support for that now. And we have to integrate
it with the way to provide the PyModuleDef in the first place (note that
extending PyModuleDef itself is not an option due to the stable ABI).
Meaning, users who don't want to provide a Create function will still have
to deal with the (empty) slots, and everyone else will currently have to
provide a one-slot "create" entry.

I'm not saying it's a bad idea, but it might not be a good one either.


> Another possible extension is hooks for resources. Imagine using Cython
> like zipapp, to pack an entire app including extensions into one file.

This can already be done. Note that there is no actual need for a native
module to be called by "python -m". You can also just add a C main()
function and start up an embedded CPython runtime in it. Cython can already
generate this main() function for you.

However, being able to "python -m" a native module (or package) would be
nice for consistency and also support running it from the PYTHONPATH, which
is a major convenience feature.


>> With the current design of PEP 489, the idea is that if you don't
>> really care about the module object, you just define Exec, and the
>> interpreter gives you a standard Python level module object. All your
>> global state still gets stored as Python objects, and you just get the
>> "C execution model with the Python data model" development experience
>> which is actually quite a nice environment to program in.
>>
>> However, if you want straighforward access to the C *data* model at
>> runtime as well as its execution model, then you can define Create and
>> use the existing PyModule_Create APIs, or (as a new feature) a custom
>> module subclass or a completely custom type, to define how your module
>> state is stored.
> 
> The problem is that to add C data, you'd either need to define an whole
> extra hook, or jump through inefficient PyCapsule hoops on every access. I
> worry that module authors will just take the path of least resistance, and
> use static data. I think it's substantially better to say "use
> sizeof(mydata) instead of 0, and use this fast function/macro to get at
> your data".

Yes, there should be a fast default way to do that. Otherwise, people will
just invent their own. The advantage of subinterpreter support and module
finalisation isn't immediately obvious, the advantage of fast access to
global state definitely is.


>> That two level approach gives you all the same flexibility you have
>> today by defining a custom Init hook (and more), but also lets you opt
>> out of learning most of the details of the C data model if all you're
>> really after is faster low level manipulation of data stored in Python
>> objects.
> 
> A module def array additionally gives:
> - support for non-ASCII module names
> - a catalog of the modules the extension contains
> but you can't use custom module subclasses -- unless a create slot is added
> to the module def. (Or you can replace the sys.modules entry -- I believe
> the overhead of a wasted empty module object is negligible.)

Yes, I guess it would be. However, the replacement must happen before other
code might access the module (e.g. by importing it), i.e. right after
putting it into sys.modules, at the very start of the Exec step.

It does seem feel a hack, though, to design an interface that says "here's
your module, throw it away if you like, but make sure to clean up what I
left behind"...

Stefan




More information about the Import-SIG mailing list