[Python-ideas] Module lifecycle: simple alternative to PEP 3121/PEP 489

Thu Apr 14 05:57:19 EDT 2016

On 04/14/2016 10:23 AM, Nikita Nemkin wrote:
> Reading PEP 3121/PEP 489 I can't stop wondering, why do extension
> modules require such specialized lifecycle APIs? Why not just let
> them subclass ModuleType? (Or any type, really, but ModuleType might
> be a good place to define some standard behavior.)

Good question.
I'll list some assumptions; if you don't share them we can talk more
about these:
- somethings are easy in Python, but not pleasant to do correctly* in C:
  - subclassing
  - creating simple objects
  - setting attributes
- most modules don't need special functionality

* ("Correctly" includes things like error checking, which many
third-party modules skimp on.)

Most of the API is in the style of "leave this NULL unless you need
something special", i.e. simple things are easy, but the complex cases
are possible. Creating custom ModuleType subclasses is possible, but I
definitely wouldn't want to require every module author to do that.

A lot of the API is convenience features, which are important: they do
the right thing (for example, w.r.t. error checking), and they're easier
to use than alternatives. This makes the API grows into several layers;
for example:
- m_methods in the PyModuleDef
- PyModule_AddFunctions
- PyCFunction_NewEx & PyObject_SetAttrString
Usually you just set the first, but if you need more control (e.g.
you're creating modules dynamically), you can use the lower-level tools.

Your suggestion won't really help with this kind of complexity.

Oh, and there is a technical reason against subclassing ModuleType
unless necessary: Custom ModuleType subclasses cannot be made to work
with runpy (i.e. python -m). For ModuleType (the ones without a custom
create_module in the current API), this doesn't *currently* work, but
the PEPs were written so that it's "just" a question of spending some
development effort on runpy.

> Module instance naturally encapsulates C level module state.
> GC/finalization happens just like for any other object. PEP 3121
> becomes redundant.

It doesn't become fully redundant: m_methods is still useful.

The rest of PyModuleDef makes the more common complex cases simpler than
a full-blown ModuleType subclass.

> Two-step initialization (PEP 489) can be achieved by defining
> a new kind of PyInit_XXX entry point, returning a module *type*,
> instead of a module *instance*. No extra API needed beyond that!

With the current API, you don't return a module *instance*, but a module
*description* (PyModuleDef). This is a lot easier than creating a
subclass in C.
With your suggestion, I fear that someone would quickly come up with a
macro to automate creating simple ModuleType instances, and at that
point the API would be as complex as it is now, but every module
instance would now also have an extra ModuleType subclass – and I don't
think that's either simpler or more effective.

> Now, importer can simply instantiate this module type, passing
> __name__, __file__ and the rest. ModuleType.tp_new will perform
> attribute init, sys.modules registration etc.
> OR
> the importer can manually pull tp_new/tp_init/attribute setup, supplanting
> type_call. (This is closer to the current way of doing things.)
> 
> Actual module initialization ("executing the module body")
> happens in tp_init. reload() is equivalent to calling tp_init again.
> 
> Subinterpreter interaction becomes transparent: every interpreter
> instantiates its own module copy. "Singleton" modules with
> external global state should fail second instantiation
> (maybe by deriving from a special SingletonModuleType subclass
> that will handle it for them).

Current status: every interpreter instantiates its own module instance.
"Singleton" modules with external global state are marked as such, and
should be written so that they fail second instantiation. (Maybe the
failing can be automated by the import machinery, but that part is not
yet implemented).
It seems to me that adding a custom ModuleType subclass to the mix
wouldn't change much.

> Additionally, custom module type allows fine grained attribute
> access control (aka metamodules), useful to many complex modules.
> C synchronized module "variables" become super-easy to define
> (tp_members). For lazy loading and importing there's
> tp_getattro, tp_getset, etc.

Right. This is the kind of thing you *do* need a ModuleType subclass
for, and the current API makes it possible to do it.

> One problem not solved by this approach (nor the current approach)
> is module state access from methods of extension types.
> At least two solutions are possible:
> 1. Look up the module by name (sys.modules) or type (new per-interpreter
>    cache).
> 2. Define a new METH_XXX calling convention (or flag) and pass both
>    PyCFunctionObject.m_self and PyCFunctionObject.m_module
>    to the C level method implementations.
> Both can be implemented, #1 being simple and #2 being proper.

*This* is the real problem now. I think #2 is viable, and I'm slowly
(too slowly perhaps) working on it

> What do you think?

I personally think that your suggestion wouldn't make the API
substantially simpler, assuming you would keep it as robust and easy to
use as the current solution. And if you would want to maintain backwards
compatibility (even with only the pre-PEP 489 state), it would be even
harder.

Many people thought about the current APIs, and (almost) all of the
unpleasant decisions we had to make do have their reasons. (And if you
ask about a more specific decision, I can give you the specific reasons.)