[Import-SIG] PEP 489: Redesigning extension module loading

Nick Coghlan ncoghlan at gmail.com
Sat Mar 21 12:04:09 CET 2015


On 21 March 2015 at 20:04, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Nick Coghlan schrieb am 21.03.2015 um 09:17:
>> I think my underlying assumption was that Cython would
>> typically use Create+Exec for speed, but might offer a slower
>> Exec-only option to get a more "Python-like" module behaviour that
>> allowed Cython acceleration of directory, zipfile and package __main__
>> modules, along with other modules intended to be executed with the
>> "-m" switch.
>
> How would these discourage (or disallow) usage of Create?

__main__ is a builtin module created by the interpreter during startup
and directly linked to things like the "-i" switch. It's not created
during the import process like a normal module, even when using -m.

So at the moment, all those execution mechanisms are limited to Python
source and compiled bytecode files - the don't support extension
modules at all. PEP 489 offers the opportunity to extend that support
to Exec-only extension modules, but doesn't do anything to improve the
situation for extension modules that also define Create.

>> On the capsule side of things, I think it's good to facilitate that as
>> an alternative to having C extension modules link directly to each
>> other
>
> Is anyone really doing that? AFAIK, it's not even portable.
>
> Cython wraps all user exported APIs as pointers in capsules and
> automatically unpacks them on the other side at import time. It even shares
> its own internal extension types (function, generator, memoryview, etc.)
> across modules these days, all using capsules.

Right, I was technically thinking of shared dependencies on common
external libraries, rather than linking directly to each other. Either
way, improving the discoverability and usability of the capsule
mechanism would be valuable - at the moment you either have to "just
know" it exists, or else be using something like Cython which takes
care of setting it up for you.

>> I briefly looked into C level UTF-8 support when adding a Unicode
>> literal to the org() and chr() docs (I originally had it in the
>> docstring as well, and it was pointed out in review that that might
>> cause problems), and I'm not sure it's possible to sensibly support
>> arbitrary Unicode module names for extension modules while our
>> baseline assumption at the C level is C89 compatibility. We should
>> definitely aim to cope with the fact that extension module names
>> *might* contain arbitrary Unicode some day, even if we don't
>> officially support that yet.
>
> The main problem with the current scheme is that the name of the module
> file must match the name of the exported symbol(s), and the module file
> name is search by the imported name. So there is a direct link between the
> (potentially non-ASCII) imported module name and the name of the
> (ASCII-only) exported entry point symbols. And the exported symbol names
> must be globally unique to support platforms with flat symbol namespaces.
> Uncoupling the imported module name from either the file name or the symbol
> name or even both isn't entirely obvious.
>
> I mean, ok, you could use a hash, or rather encode the name in punicode
> (and replace "-" by "_" in the symbol name). That would at least keep it
> somewhat readable for latin based scripts, while being fully backwards
> compatible to what we have (that was the whole point of the punicode
> design). Actually, why not just do that? :)

We do have a punycode encoder in the standard library, so it should be
possible to use that to determine a suitable hook name when given a
non-ASCII extension module to load.

For example:

    >>> "münchen".encode("punycode").replace(b"-", b"_")
    b'mnchen_3ya'

That would make it possible for "import münchen" to work with an
extension module by looking for a file named "münchen", but using
"PyInit_mnchen_3ya", "PyModuleCreate_mnchen_3ya" and
"PyModuleCreate_mnchen_3ya" as the hooks to look for.

It wouldn't be pretty to write by hand, but it should be fine for
extension module generators like Cython and SWIG.

>> I thought Brett actually implemented multi-module extension support a
>> while back (which this PEP would then inherit), but I can't find any
>> current evidence of that change, so either my recollection is wrong,
>> or my search skills are failing me :)
>
> How should that work? Would it just try to look up all "PyInit_*" symbols
> and call them? In arbitrary order?

I think this may be what Petr was referring to when he said the
current multi-module scheme only supported *loading* multiple modules
from the same file, but not finding them. You need to use OS level
symlinks or a similar mechanism to get the current finder to work in
this situation (and as you say, it's not clear what a finder would
look like in the absence of such filesystem level assistance - we
can't afford to scan every shared library for possible symbol
exports).

>> It likely makes sense as a separate follow-on PEP for 3.6 though, as
>> it's a further simplification of a certain way of using Create+Exec,
>> and it's not clear just how you'd handle certain combinations of
>> values in the current PyModuleDef struct. PEP 489 currently deals with
>> that neatly by breaking out separate helper functions for initialising
>> the docstring and the module globals function table that can be called
>> from either Exec or Create as appropriate.
>
> While I agree that this can be done later, I also think that adding yet
> another interface after the current change will only make it more difficult
> for users to get started and get their stuff done.
>
> Exporting a struct does sound like the most generic and future proof
> approach so far. If(f) we already assume that it will eventually become
> useful, we shouldn't go for less.
>
> Do you have any specific problem with the PyModuleDef "value combinations"
> in mind? I mean, we could always apply further restrictions on the content
> of an exported PyModuleDef when used for this interface. Unexpected setups
> should be easy to validate and reject by the import machinery, even if it's
> just because it's "not currently supported". Being strict is easy here.

The main thing that makes me wary is the redesign of the type
definition system in PEP 384 to move away from exporting a static
struct to declare new type objects. We did that because it made
evolving the definition of type objects in an ABI compatible way very
difficult.

On the other hand, the main problem there was really the giant
collection of slot pointers, which PyType_FromSpec replaced with a
null-terminated array of slot definitions, as well as with the fact
you were exporting the type struct directly. By contrast, PyModuleDef
is already distinct from the actual internal layout of CPython module
objects, and PyModuleDef.m_methods is already a null-terminated array
of PyMethodDef entries.

So, if we went down this path you *wouldn't* be able to completely
customise module creation - you'd just have the option of exporting a
PyModuleDef struct that the interpreter would then pass to
PyModule_Create() on your behalf. If you wanted to replace the
extension module with a different kind of object entirely, you'd swap
it out of sys.modules in your Exec implementation, just as pure Python
modules can replace themselves in module level code.

The big advantage of this approach is that it ties PEP 489 directly
back to the module state management enhancements in PEP 3121 - if you
need more control than the new PyModule_SetDocString and
PyModule_AddFunctions interfaces give you, then you need to export a
PyModuleDef to define how the module gets created, including the
ability to set m_size to -1 to indicate you're using C level module
globals, or to > 0 to reserve additional space for module state.

I think you've sold me on the idea - I'm not seeing any major
downsides any more, and a lot of enhancements. The one refinement I
would make is to allow "m_name" to be NULL to request that the import
machinery fill it in automatically.

>> With the current design of PEP 489, the idea is that if you don't
>> really care about the module object, you just define Exec, and the
>> interpreter gives you a standard Python level module object. All your
>> global state still gets stored as Python objects, and you just get the
>> "C execution model with the Python data model" development experience
>> which is actually quite a nice environment to program in.
>>
>> However, if you want straighforward access to the C *data* model at
>> runtime as well as its execution model, then you can define Create and
>> use the existing PyModule_Create APIs, or (as a new feature) a custom
>> module subclass or a completely custom type, to define how your module
>> state is stored.
>>
>> That two level approach gives you all the same flexibility you have
>> today by defining a custom Init hook (and more), but also lets you opt
>> out of learning most of the details of the C data model if all you're
>> really after is faster low level manipulation of data stored in Python
>> objects.
>
> I'm ok with either, but I'd really like to avoid replacing the new scheme
> by yet another new scheme in the future.

Aye, you've persuaded me that we don't need to allow full
customisation - an implicit call to PyModule_Create() should suffice.

We're going to need to be careful in the interaction with PEP 384,
though. Currently, that has the call to PyModule_Create() in the
extension module with the PyMethodDef declaration, and passes in
information regarding the expected CPython ABI. That allows the
interpreter to process the MethodDef appropriately for the stable ABI.
I'm not sure how the details of that work internally myself, so Petr
would need to check into the consequences before committing to change
the PEP.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Import-SIG mailing list