[Import-SIG] PEP 489: Redesigning extension module loading

Stefan Behnel stefan_ml at behnel.de
Sat Mar 21 11:04:17 CET 2015

Nick Coghlan schrieb am 21.03.2015 um 09:17:
> I think my underlying assumption was that Cython would
> typically use Create+Exec for speed, but might offer a slower
> Exec-only option to get a more "Python-like" module behaviour that
> allowed Cython acceleration of directory, zipfile and package __main__
> modules, along with other modules intended to be executed with the
> "-m" switch.

How would these discourage (or disallow) usage of Create?

> On the capsule side of things, I think it's good to facilitate that as
> an alternative to having C extension modules link directly to each
> other

Is anyone really doing that? AFAIK, it's not even portable.

Cython wraps all user exported APIs as pointers in capsules and
automatically unpacks them on the other side at import time. It even shares
its own internal extension types (function, generator, memoryview, etc.)
across modules these days, all using capsules.

> but I'm not sure it makes sense to encourage it as a way for a
> module to access its *own* state that can't readily be stored in a
> Python dictionary as a normal Python object. So perhaps the patterns
> to encourage here are:
> * prefer only defining Exec, with state stored as Python objects in
> the module globals
> * if you need C level global state, then you need to define Create as
> well and return a suitable object, such as a PyModule subclass, or the
> result of calling PyModule_Create with m_size > 0 in PyModuleDef
> * if you also need fast access to operations defined in other
> extension modules, prefer reading and saving references to the
> relevant capsule objects in Exec over direct C level linking at build
> time


> (Regarding that last point, we may want to some day consider exposing
> suitable capsules for some C accelerated standard library modules,
> like _decimal, rather than expanding the C API itself to cover those
> types)

+10 :)

> I briefly looked into C level UTF-8 support when adding a Unicode
> literal to the org() and chr() docs (I originally had it in the
> docstring as well, and it was pointed out in review that that might
> cause problems), and I'm not sure it's possible to sensibly support
> arbitrary Unicode module names for extension modules while our
> baseline assumption at the C level is C89 compatibility. We should
> definitely aim to cope with the fact that extension module names
> *might* contain arbitrary Unicode some day, even if we don't
> officially support that yet.

The main problem with the current scheme is that the name of the module
file must match the name of the exported symbol(s), and the module file
name is search by the imported name. So there is a direct link between the
(potentially non-ASCII) imported module name and the name of the
(ASCII-only) exported entry point symbols. And the exported symbol names
must be globally unique to support platforms with flat symbol namespaces.
Uncoupling the imported module name from either the file name or the symbol
name or even both isn't entirely obvious.

I mean, ok, you could use a hash, or rather encode the name in punicode
(and replace "-" by "_" in the symbol name). That would at least keep it
somewhat readable for latin based scripts, while being fully backwards
compatible to what we have (that was the whole point of the punicode
design). Actually, why not just do that? :)

> I thought Brett actually implemented multi-module extension support a
> while back (which this PEP would then inherit), but I can't find any
> current evidence of that change, so either my recollection is wrong,
> or my search skills are failing me :)

How should that work? Would it just try to look up all "PyInit_*" symbols
and call them? In arbitrary order?

> While looking for such evidence, I was also reminded of the fact that
> https://docs.python.org/3/c-api/ is missing a reference section on how
> extension module importing actually works - the only current
> explanation is in the more tutorial style
> https://docs.python.org/3/extending/extending.html#the-module-s-method-table-and-initialization-function
> That missing reference section is a docs gap that should likely be
> fixed as part of these changes.


>>> The "inittab" idea made me think of this:
>>> An extension could export an array of PyModuleDef, which has all the needed
>>> data for module creation and initialization:
>> I remember discussing this on python-dev, it was one of the ideas in the
>> original thread that lead to the Create-Exec proto-pep:
>> http://thread.gmane.org/gmane.comp.python.devel/135764/focus=140986
>> I think the main counter argument at the time was that there should be a
>> way to control the module object instantiation. :)
> It's an interesting notion - you could export the arguments to a call
> to PyModule_Create (and/or PyModule_Create2, and/or a new different
> function that accepts a different declaration API) and have an
> entirely static module initialisation process in at least some cases.
> It likely makes sense as a separate follow-on PEP for 3.6 though, as
> it's a further simplification of a certain way of using Create+Exec,
> and it's not clear just how you'd handle certain combinations of
> values in the current PyModuleDef struct. PEP 489 currently deals with
> that neatly by breaking out separate helper functions for initialising
> the docstring and the module globals function table that can be called
> from either Exec or Create as appropriate.

While I agree that this can be done later, I also think that adding yet
another interface after the current change will only make it more difficult
for users to get started and get their stuff done.

Exporting a struct does sound like the most generic and future proof
approach so far. If(f) we already assume that it will eventually become
useful, we shouldn't go for less.

Do you have any specific problem with the PyModuleDef "value combinations"
in mind? I mean, we could always apply further restrictions on the content
of an exported PyModuleDef when used for this interface. Unexpected setups
should be easy to validate and reject by the import machinery, even if it's
just because it's "not currently supported". Being strict is easy here.

> With the current design of PEP 489, the idea is that if you don't
> really care about the module object, you just define Exec, and the
> interpreter gives you a standard Python level module object. All your
> global state still gets stored as Python objects, and you just get the
> "C execution model with the Python data model" development experience
> which is actually quite a nice environment to program in.
> However, if you want straighforward access to the C *data* model at
> runtime as well as its execution model, then you can define Create and
> use the existing PyModule_Create APIs, or (as a new feature) a custom
> module subclass or a completely custom type, to define how your module
> state is stored.
> That two level approach gives you all the same flexibility you have
> today by defining a custom Init hook (and more), but also lets you opt
> out of learning most of the details of the C data model if all you're
> really after is faster low level manipulation of data stored in Python
> objects.

I'm ok with either, but I'd really like to avoid replacing the new scheme
by yet another new scheme in the future.


More information about the Import-SIG mailing list