[Import-SIG] PEP 489: Redesigning extension module loading

Petr Viktorin encukou at gmail.com
Thu Mar 19 14:37:36 CET 2015


On 03/19/2015 11:31 AM, Stefan Behnel wrote:
> Hi Petr,
>
> thanks for working on this. I added my comments inline.

Thanks for your comments, they're a nice reality check.
I'm feeling a bit like I and Nick misunderstood Cython requirements 
somewhat, and concentrated on unimportant points (loading into 
pre-created modules) while ignoring important ones (fast access to 
module state). You also pointed out interesting things we didn't think 
about too much (non-ASCII names, multi-module extensions).

One of the PEP's stated goals is that the behavior of extension modules 
should be be closer to Python modules. But if the solution (Exec-only 
modules) does't work for Cython, then the goal is pretty much 
irrelevant. I believe PyCapsule is the cleanest way of putting C state 
onto arbitrary objects, and by this time I can say it's not working.

Perhaps it's time to say that extension modules *are* fundamentally 
different from pure Python ones. (And rewrite the PEP. *sigh*)

I'll keep your comments in mind, but I have this idea that could make 
them obsolete; I'll reply to them if it gets shot down.

>> Multiple modules in one library
>> -------------------------------
>>
>> To support multiple Python modules in one shared library, the library
>> must export appropriate PyModuleExec_<name> or PyModuleCreate_<name> hooks
>> for each exported module.
>> The modules are loaded using a ModuleSpec with origin set to the name of the
>> library file, and name set to the module name.
>>
>> Note that this mechanism can currently only be used to *load* such modules,
>> not to *find* them.
>>
>> XXX: This is an existing issue; either fix it/wait for a fix or provide
>> an example of how to load such modules.
>
> I really like that idea. It's essentially an extended inittab mechanism,
> also usable for executable single-file distributions (maybe even "python
> -m"), non-ASCII module names and "__init__.so" packages that import as an
> entire package structure of multiple modules.
>
> Needs some kind of "import module from library" C-API mechanism, though, or
> at least an explicitly exported list of modules to import from a shared
> library in the right order. I'd rather go for some kind of explicit import
> that creates these modules on request.


It seems that, with this PEP, the main reason for extension authors to 
implement Create would be to get per-module state. PyCapsules in the 
module dict are not a good idea speed-wise; static C-level data is not 
an option if subinterpreters need to be supported.

The "inittab" idea made me think of this:

An extension could export an array of PyModuleDef, which has all the 
needed data for module creation and initialization:

- m_name - for the "requested" name for the module (not necessarily what 
it'll be loaded as), for identifying modules in multi-module extensions
- m_size - for requesting per-module C state)
- m_reload (currently unused) would be the exec function (called for 
initialization and reload)

This would rule out completely custom module objects, but are those 
needed anyway? A module can always replace itself in sys.modules if it 
needs extra magic. Getting rid of Create entirely supports a lot of the 
other goals (running user code in Create, pushing for subinterpreter 
support). And things like module properties or callable modules are not 
possible in source modules as well; perhaps those should be solved at a 
higher level.

With this, you couldn't load extensions into arbitrary objects. But it 
would be possible to load into pre-created modules, as long as they were 
pre-created with the correct ModuleDef. It would probably be somewhat 
more difficult to make runpy (or custom loading libraries) ­work with 
these extension modules, but it should be possible.

Implementation-wise, having m_reload filled in from the start would 
help: the PEP calls for looking up two entrypoints, and the lookup is 
relatively expensive (judging by the amount of caching in current code).

It would also help with non-ASCII names, since the name is a string 
rather than a C identifier. Entrypoint and file names would need some 
design to make everything work. But before I go thinking about that: 
Does this seem like a better direction than Create/Exec?



More information about the Import-SIG mailing list