[Import-SIG] On singleton modules, heap types, and subinterpreters

Sun Jul 26 12:39:21 CEST 2015

Hello,
This is a follow-up to PEP 489 and discussions regarding per-module
data PyState_FindModule.
It turned out to be quite the rabbit hole. Apologies for the long
mail, I hope it ends up sufficiently clear.

Using single-phase initialization (the pre-PEP 489 solution),
extension modules are effectively singletons – there's up to one
instance of a particular module in any given subinterpreter. Cython
modules only allow one instance *per process*.

Using the new multiple-phase init, one can create several modules from
one PyModuleDef – either (again) one per subinterpreter, or for
testing purposes. As per the goal of PEP 489, this brings extension
modules closer to how Python modules behave.

The problem is that classes defined in a module don't have a reference
to the module object. For example, the _csv module defines the classes
"reader" and "Error". The code in "reader" needs to have access to
"Error" in order to raise exceptions.
(The Error class here is just an example of module state; things like
_csv's global field size limit or Cython globals also need to be
accessed from classes.)
With the traditional single-phase init, to access module state, one
can use PyState_FindModule, which queries a per-subinterpreter mapping
of PyModuleDef to module object. This obviously assumes one module per
subinterpreter, which is a limitation that PEP 489 currently avoids.
Bringing this limitation back would probably be the easiest solution
to the problem I'm describing here; this has been discussed in the
form of "singleton modules" [0], and postponed in hopes of a better
solution.

So, what options are there for methods of extension classes to get a
hold of the module object (or module state)?
For static classes, it's not possible to store a reference module,
because multiple modules can use a single static class.
Also, static classes won't behave well with multiple subinterpreters:
if an object of such a class is passed into a submodule that doesn't
have the corresponding module object loaded, PyState_FindModule will
fail. And PyState_FindModule failure tends to have very nasty
consequences – there's really not much you can do when you get a NULL,
and most modules don't even check. And even if PyState_FindModule
succeeds, in a "foreign" subinterpreter it arguably won't find the
"correct" module instance. So, singleton modules (or the pre-PEP489
status quo) aren't a good answer.

So it seems that extension modules that need per-module state need to
use heap types. And the heap types need a reference to "their" module.
And methods of those types need to be called with the class that
defined them.
This would be possible with regular methods. But, consider for example
the tp_iternext signature:

    PyObject* myobj_iternext(PyObject *self)

There's no good way for this function to get a reference to the class
it belongs to.
`Py_TYPE(self)` might be a subclass. The best way I can think of is
walking the MRO until I get to a class with tp_iter (or a class
created from "my" known PyType_Spec), but one of the requirements on
module state is that it needs to be efficient, so I'd rather avoid
walking a list.

That's where I'm currently stuck. Does anyone have any ideas/comments
on this problem?

[0] https://mail.python.org/pipermail/import-sig/2015-April/000946.html