[Import-SIG] PEP 489: Redesigning extension module loading

Nick Coghlan ncoghlan at gmail.com
Thu Mar 26 05:25:45 CET 2015


On 25 March 2015 at 23:36, Petr Viktorin <encukou at gmail.com> wrote:
> On 03/25/2015 01:11 PM, Nick Coghlan wrote:
>>
>> On 25 March 2015 at 02:34, Petr Viktorin <encukou at gmail.com> wrote:
>>>
>>> I'll share my notes on an API with PEP 384-style slots, before attempting
>>> to
>>> write it out in PEP language.
>>>
>>> I struggled to find a good name for the "PyType_Spec" equivalent, since
>>> ModuleDef and ModuleSpec are both taken, but then I realized that, if the
>>> docstring is put in a slot, I just need an array of slots...
>>
>>
>> Because we're looking for an exported symbol, I think there's value in
>> having a more clearly defined top level structure rather than just an
>> array.
>
>
> OK.
> I'm not sure on cross-platform support of data rather than functions
> exported from shared libraries, so kept the hook as a function.
> Perhaps I'm being too paranoid here?

Given that http://bugs.python.org/issue23743 came across my inbox this
morning, I'm going to go with "No, you're not being too paranoid once
we take C++ compilers and linkers into account".

Perhaps we could make it use a new PyExport prefix though and drop the
integer IDs in favour of exporting additional symbols? That is, have
the hook be "PyExport_spam" with a separate "PyExport_spam_methods"?

That opens the door to potentially having *other* export APIs in the
future, like "PyExport_spam_codecs", "PyExport_spam_types",
"PyExport_spam_constants_str".

The main downside I see is potentially needing to check the shared
library's list of exported symbols at import time for a potentially
growing series of names, so also consider a variant of this idea that
keeps the numeric slots instead of the symbol suffixes I describe.

>> PyModule_Export or PyModule_Declare come to mind, with a preference
>> for the former (since we're exporting a module definition for CPython
>> to import)
>
> That's the name I was looking for, thanks!

https://www.python.org/dev/peps/pep-0459/#the-python-exports-extension
(which I first drafted some time back) came up in another discussion
recently, and my brain finally connected it back to the C extension
module API design problem :)

I'm wondering if PyExportDef_Module might be a better name though
(more on that below).

>> typedef struct PyModule_Export {
>>    const char* doc;
>>    PyModule_Slot *slots; /* terminated by slot==0. */
>> } PyModule_Export;
>>
>> I prefer this mostly because it's easier to document and hence to
>> understand - you can cover the process of creating the overall module
>> in relation to PyModule_Export, while PyModule_Slot docs can focus on
>> defining the *content* of the module.
>
> I don't think this is a problem. I can document creating with the
> PyModuleExport_<modulename> symbol, and then when say that it's an array of
> PyModule_Slot in the appropriate section.

As you can see above, I realised we may be thinking about this the
wrong way: we don't necessarily need to worry about making
PyModule_Export itself extensible, as if we want to allow additional
"addons" later, we can potentially use the C level linker namespace.

In that model, each new slot would get a new suffix rather than a numeric ID.

>> Having the docstring as the only expected field helps suggest that
>> modules should at least define that much. Unlike types, we can leave
>> the name out by default, as it will usually be implied by the file
>> name (as is the case with Python modules).
>
> The downside is that it's additional boilerplate. PyType_Spec has a bunch of
> mandatory int fields, but here everything is a pointer.

A pointer which we're considering converting to (void *) and naming
via a relatively opaque integer. I can see the necessity for that in
the PyType_Spec case (given the huge number of slots and the fact
we're creating them dynamically rather than deriving them from a
shared library's exported symbols), but we're not talking anywhere
near that number of slots here, and we're already coupled to the C
linker semantics as that's how we find the initial export hook in the
first place.

> Also, does the docstring always need to be specified (as a constant)? I
> think some internal modules are fine without a docstring (see _hashlib,
> _multiprocessing, _elementtree, _sqlite3, ...).
>
> But if you're convinced a separate PyModule_Export structure is better, I
> won't fight.

I suspect it will be helpful if we replace the "named slots for future
expansion" idea with suffixed exported symbols, but would be less
useful if we keep the numbered slots.

>> You've sold me on the idea of using a slots based API, though.
>> However, the PEP's going to need to spend a bit more time on how to
>> map this to the existing PyModule_Create API for modules that also
>> want to support older versions of Python, while using the new system
>> on 3.5+.
>
> Agreed.

I suspect my new multiple exports will also make it easier to provide
compatibility boilerplate that folks can use to write a backwards
compatibility PyInit_spam shim, as they'll all be normal C functions
that follow a defined naming scheme, whereas the numeric slots case
requires a bit more work to process the slots correctly.

In a "multiple exported symbols" module, the struct definitions may
look something like:

    typedef struct PyExportDef_ModuleState {
         int size;
         traverseproc m_traverse;
         inquiry m_clear;
         freefunc m_free;
    }

    typedef export PyExportDef_Module {
        const char *doc;
        PyExportDef_ModuleState *state;
    }

    PyExportDef_Module * PyExport_spam();
    int PyExport_spam_exec(PyObject *mod);

OR, for complete customisation rather than using a standard module
object post-processed by the exec hook:

    PyObject * PyExport_spam_create(PyObject *mod_spec);
    int PyExport_spam_exec(PyObject *mod);

Exporting both the declarative PyExport_spam and the imperative
PyExport_spam_create would an error. Either approach can be combined
with exporting PyExport_spam_exec which would be run after all other
declarative hooks.

Rather than a seperate slot, easily exporting module level functions would be:

    PyMethodDef * PyExport_spam_methods();

(Option: alias PyMethodDef as PyExportDef_Method)

>> I'm also wondering if "exec" should move to be an "m_init" method in
>> PyModule_StateDef, rather than an independent slot, replacing it with
>> a PyType_Spec "types" slot as suggested below.
>
>
> No. Sometimes the exec doesn't need C state. It can work with just the
> module dict, for example to export some methods conditionally, or export
> objects that aren't methods/classes/whatever there's a special slot for.

In the above sketch, that would be indicated by setting the "state"
pointer to NULL.

>>> And then a slot adding string/int/... constants from arrays of name/value
>>> would mean most modules wouldn't need an exec function.
>>
>> For those cases, I think the module internally is likely to want fast
>> C level access to the relevant constants - this note is the one that
>> inspired my suggestion of moving the "exec" link into the statedef
>> slot.
>
>
> This is for wrapping constants that are already known at the C level.
> For example _ssl has a long list of these calls:
>     PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",
>                             PY_SSL_ERROR_ZERO_RETURN);
>     PyModule_AddIntConstant(m, "SSL_ERROR_WANT_READ",
>                             PY_SSL_ERROR_WANT_READ);
>     PyModule_AddIntConstant(m, "SSL_ERROR_WANT_WRITE",
>                             PY_SSL_ERROR_WANT_WRITE);
>     PyModule_AddIntConstant(m, "SSL_ERROR_WANT_X509_LOOKUP",
>                             PY_SSL_ERROR_WANT_X509_LOOKUP);
>     PyModule_AddIntConstant(m, "SSL_ERROR_SYSCALL",
>                             PY_SSL_ERROR_SYSCALL);
>     PyModule_AddIntConstant(m, "SSL_ERROR_SSL",
>                             PY_SSL_ERROR_SSL);
>     PyModule_AddIntConstant(m, "SSL_ERROR_WANT_CONNECT",
>                             PY_SSL_ERROR_WANT_CONNECT);
>
> ... and so on. Many modules don't have proper error checking for this.

Ah, yes, I understand. Indeed, changing that to a pair of hooks that
exports a set of "name, value" pairs for integers or strings would be
valuable. Continuing the naming scheme from above:

    PyExportDef_Str *PyExport_spam_constants_str();
    PyExportDef_Int *PyExport_spam_constants_int();

Pulling this idea for your full extension example:

static PyExportDef_Method spam_methods[] = {
    {"demo", (PyCFunction)spam_demo,  ...},
    {NULL, NULL}
};

static PyExportDef_ModuleState spam_statedef = {
    sizeof(spam_state_t),
    spam_state_traverse,
    spam_state_clear,
    spam_state_free
    /* any of those three can be NULL if not needed */
}

static PyExportDef_Module spam_module = {
    PyDoc_STR("A spammy module"),
    spam_exec,
    spam_statedef
}

PyExportDef_Module *PyExport_spam {
    return spam_module;
}

PyExportDef_Method *PyExport_spam_methods {
    return spam_methods;
}

Using slots instead, the last part (from spam_module down) would
revert to being closer to your example:

static PyExportDef_ModuleSlot spam_slots[] = {
    {Py_m_doc, PyDoc_STR("A spammy module")},
    {Py_m_methods, spam_methods},
    {Py_m_statedef, spam_statedef},
    {Py_m_exec, spam_exec},
    {0, NULL}
}

PyExportDef_ModuleSlot *PyExport_spam {
    return spam_slots;
}

So actually writing that down suggests numeric slots may still be a
better idea. I like the "PyExport" and "PyExportDef" prefixes though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Import-SIG mailing list