[Import-SIG] On singleton modules, heap types, and subinterpreters

Sun Jul 26 15:49:10 CEST 2015

 On Sun, Jul 26, 2015 at 2:50 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 26 July 2015 at 20:39, Petr Viktorin <encukou at gmail.com> wrote:
>> So it seems that extension modules that need per-module state need to
>> use heap types. And the heap types need a reference to "their" module.
>> And methods of those types need to be called with the class that
>> defined them.
>> This would be possible with regular methods. But, consider for example
>> the tp_iternext signature:
>>
>>     PyObject* myobj_iternext(PyObject *self)
>>
>> There's no good way for this function to get a reference to the class
>> it belongs to.
>> `Py_TYPE(self)` might be a subclass. The best way I can think of is
>> walking the MRO until I get to a class with tp_iter (or a class
>> created from "my" known PyType_Spec), but one of the requirements on
>> module state is that it needs to be efficient, so I'd rather avoid
>> walking a list.
>>
>> That's where I'm currently stuck. Does anyone have any ideas/comments
>> on this problem?
>
> (I'm assuming I'm going to be retreading ground you've already covered
> in your own investigations here, but I want to make sure we're at
> least thinking along the same lines)
>
> Let's start by assuming the following constraints:
>
> * we can add new standard function signatures
> * we can add new calling convention flags
> * we *can't* change slot signatures
>
> Tackling the easy problem first, the new standard function signatures could be:
>
>     PyObject* (*PyCMethod)(PyObject *module, PyObject *self, PyObject *args)
>
>     PyObject* (*PyCMethodWithKeywords)(PyObject *module, PyObject
> *self, PyObject *args, PyObject *kwds)
>
> The new calling conventions would be METH_VARAGS_METHOD,
> METH_KEYWORDS_METHOD and METH_NOARGS_METHOD (probably implemented as a
> single new flag like METH_MODULE that these set).
>
> The key difference between the *_METHOD conventions and their existing
> PyCFunction counterparts is that when you use the latter for methods
> on a class, the class instance is passed in *instead of* the module
> reference, while with this change, methods on a class would receive
> the instance *in addition to* the module reference.
>
> To facilitate this, type objects would also need to gain a new
> __module__ attribute.
>
> Ignoring slots, extension modules written for Python 3.6+ could then
> just use the PyCFunction calling conventions for module level
> functions, and the new PyCMethod ones for actual methods on extension
> classes, and things should "just work". Extension modules (including
> Cython) that needed to maintain compatibility with older versions
> could implement wrappers that used PyState_FindModule to pass in the
> appropriate module name and use those in combination with single-phase
> initialisation on older versions that didn't support the new call
> signatures.

Yes, that's pretty much what I had in mind when I said "This would be
possible with regular methods" :)
Rather than the module, I'd pass in the defining class, and letting
the method look up __module__ itself. But that's pretty minor.

> For the slot case, where we can't change the function signature to
> accept the module object directly, I'm wondering if we could take a
> leaf out of the decimal module's book and define the notion of a
> thread local "active module", together with a way to automatically
> define slot wrappers that manage the active module. The latter might
> look something like:
>
>     PyObject* PyType_FromSpecInModule(PyType_Spec* spec, PyModule*
> module, int* wrapped_slot_ids)
>
> With the following consequences:
>
> * the newly defined type would have its __module__ attribute set appropriately
> * the slots named in the NULL terminated "wrapped_slot_ids" array
> would be replaced with wrappers that pushed the given module onto the
> active module stack, called the function supplied in the type spec,
> and popped the active module off again (as a possible optimisation,
> there could potentially be a counter for how many times the currently
> active module had been pushed, rather than actually pushing the same
> pointer multiple times)
>
> That then gets us to your original hard question, which is "How would
> the slot wrappers look up the correct module?". There, I think the
> definition time "fixup_slots" operations in the type machinery may
> help: this is the code where the function pointers are copied from the
> base classes to the slots in the class currently being defined. If
> there was a way of flagging "module aware" slots at type definition
> time, then that same code (or an equivalent loop run later on) could
> be used to populate a mapping from slot IDs to the appropriate module
> object.
>
> The fastest and simplest way I can think of to do the module object
> lookup would be to have a C level PyObject* array keyed by the
> PyType_Slot slot IDs - finding the right module would then be a matter
> of having predefined wrappers for each slot that looked up the
> appropriate slot ID to get both the module to activate and the
> function pointer for the actual slot implementation. Any type defined
> using PyType_FromSpecInModule with a non-NULL "wrapped_slot_ids" would
> incur the same memory cost in terms of the size of the type object
> itself.
>
> Even though the memory hit for making an extension type module aware
> would be constant using that approach, the runtime speed hit would
> still only affect the specifically wrapped slots that were flagged as
> needing the active module state to be updated around the call.
>
> There'd be a lot of devils in the details of making such a scheme
> work, and we'd want to quantify the impact of converting a slot
> definition from a singleton implementation to a subinterpreter
> friendly implementation, but I'm not seeing anything fundamentally
> unworkable about the above approach. It makes me nervous from a
> maintainability perspective (typeobject.c and function calls are
> already hairy, and this would make both of them worse), but if the
> pay-off is substantially improved subinterpreter support, I think it
> will be worth it (especially if Eric is able to manage the trick of
> allowing subinterpreters to run concurrently on different cores)

That does sound doable, even if it is a pretty arcane workaround.
It should at least do as a proof of concept, to allow exploring this
space further.
Thank you!