[Import-SIG] On singleton modules, heap types, and subinterpreters

Sun Jul 26 14:50:37 CEST 2015

On 26 July 2015 at 20:39, Petr Viktorin <encukou at gmail.com> wrote:
> So it seems that extension modules that need per-module state need to
> use heap types. And the heap types need a reference to "their" module.
> And methods of those types need to be called with the class that
> defined them.
> This would be possible with regular methods. But, consider for example
> the tp_iternext signature:
>
>     PyObject* myobj_iternext(PyObject *self)
>
> There's no good way for this function to get a reference to the class
> it belongs to.
> `Py_TYPE(self)` might be a subclass. The best way I can think of is
> walking the MRO until I get to a class with tp_iter (or a class
> created from "my" known PyType_Spec), but one of the requirements on
> module state is that it needs to be efficient, so I'd rather avoid
> walking a list.
>
> That's where I'm currently stuck. Does anyone have any ideas/comments
> on this problem?

(I'm assuming I'm going to be retreading ground you've already covered
in your own investigations here, but I want to make sure we're at
least thinking along the same lines)

Let's start by assuming the following constraints:

* we can add new standard function signatures
* we can add new calling convention flags
* we *can't* change slot signatures

Tackling the easy problem first, the new standard function signatures could be:

    PyObject* (*PyCMethod)(PyObject *module, PyObject *self, PyObject *args)

    PyObject* (*PyCMethodWithKeywords)(PyObject *module, PyObject
*self, PyObject *args, PyObject *kwds)

The new calling conventions would be METH_VARAGS_METHOD,
METH_KEYWORDS_METHOD and METH_NOARGS_METHOD (probably implemented as a
single new flag like METH_MODULE that these set).

The key difference between the *_METHOD conventions and their existing
PyCFunction counterparts is that when you use the latter for methods
on a class, the class instance is passed in *instead of* the module
reference, while with this change, methods on a class would receive
the instance *in addition to* the module reference.

To facilitate this, type objects would also need to gain a new
__module__ attribute.

Ignoring slots, extension modules written for Python 3.6+ could then
just use the PyCFunction calling conventions for module level
functions, and the new PyCMethod ones for actual methods on extension
classes, and things should "just work". Extension modules (including
Cython) that needed to maintain compatibility with older versions
could implement wrappers that used PyState_FindModule to pass in the
appropriate module name and use those in combination with single-phase
initialisation on older versions that didn't support the new call
signatures.

For the slot case, where we can't change the function signature to
accept the module object directly, I'm wondering if we could take a
leaf out of the decimal module's book and define the notion of a
thread local "active module", together with a way to automatically
define slot wrappers that manage the active module. The latter might
look something like:

    PyObject* PyType_FromSpecInModule(PyType_Spec* spec, PyModule*
module, int* wrapped_slot_ids)

With the following consequences:

* the newly defined type would have its __module__ attribute set appropriately
* the slots named in the NULL terminated "wrapped_slot_ids" array
would be replaced with wrappers that pushed the given module onto the
active module stack, called the function supplied in the type spec,
and popped the active module off again (as a possible optimisation,
there could potentially be a counter for how many times the currently
active module had been pushed, rather than actually pushing the same
pointer multiple times)

That then gets us to your original hard question, which is "How would
the slot wrappers look up the correct module?". There, I think the
definition time "fixup_slots" operations in the type machinery may
help: this is the code where the function pointers are copied from the
base classes to the slots in the class currently being defined. If
there was a way of flagging "module aware" slots at type definition
time, then that same code (or an equivalent loop run later on) could
be used to populate a mapping from slot IDs to the appropriate module
object.

The fastest and simplest way I can think of to do the module object
lookup would be to have a C level PyObject* array keyed by the
PyType_Slot slot IDs - finding the right module would then be a matter
of having predefined wrappers for each slot that looked up the
appropriate slot ID to get both the module to activate and the
function pointer for the actual slot implementation. Any type defined
using PyType_FromSpecInModule with a non-NULL "wrapped_slot_ids" would
incur the same memory cost in terms of the size of the type object
itself.

Even though the memory hit for making an extension type module aware
would be constant using that approach, the runtime speed hit would
still only affect the specifically wrapped slots that were flagged as
needing the active module state to be updated around the call.

There'd be a lot of devils in the details of making such a scheme
work, and we'd want to quantify the impact of converting a slot
definition from a singleton implementation to a subinterpreter
friendly implementation, but I'm not seeing anything fundamentally
unworkable about the above approach. It makes me nervous from a
maintainability perspective (typeobject.c and function calls are
already hairy, and this would make both of them worse), but if the
pay-off is substantially improved subinterpreter support, I think it
will be worth it (especially if Eric is able to manage the trick of
allowing subinterpreters to run concurrently on different cores)

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia