On 04/14/2016 10:23 AM, Nikita Nemkin wrote:
Reading PEP 3121/PEP 489 I can't stop wondering, why do extension modules require such specialized lifecycle APIs? Why not just let them subclass ModuleType? (Or any type, really, but ModuleType might be a good place to define some standard behavior.)
Good question. I'll list some assumptions; if you don't share them we can talk more about these: - somethings are easy in Python, but not pleasant to do correctly* in C: - subclassing - creating simple objects - setting attributes - most modules don't need special functionality
* ("Correctly" includes things like error checking, which many third-party modules skimp on.)
Most of the API is in the style of "leave this NULL unless you need something special", i.e. simple things are easy, but the complex cases are possible. Creating custom ModuleType subclasses is possible, but I definitely wouldn't want to require every module author to do that.
A lot of the API is convenience features, which are important: they do the right thing (for example, w.r.t. error checking), and they're easier to use than alternatives. This makes the API grows into several layers; for example: - m_methods in the PyModuleDef - PyModule_AddFunctions - PyCFunction_NewEx & PyObject_SetAttrString Usually you just set the first, but if you need more control (e.g. you're creating modules dynamically), you can use the lower-level tools.
Your suggestion won't really help with this kind of complexity.
Oh, and there is a technical reason against subclassing ModuleType unless necessary: Custom ModuleType subclasses cannot be made to work with runpy (i.e. python -m). For ModuleType (the ones without a custom create_module in the current API), this doesn't *currently* work, but the PEPs were written so that it's "just" a question of spending some development effort on runpy.
Module instance naturally encapsulates C level module state. GC/finalization happens just like for any other object. PEP 3121 becomes redundant.
It doesn't become fully redundant: m_methods is still useful.
The rest of PyModuleDef makes the more common complex cases simpler than a full-blown ModuleType subclass.
Two-step initialization (PEP 489) can be achieved by defining a new kind of PyInit_XXX entry point, returning a module *type*, instead of a module *instance*. No extra API needed beyond that!
With the current API, you don't return a module *instance*, but a module *description* (PyModuleDef). This is a lot easier than creating a subclass in C. With your suggestion, I fear that someone would quickly come up with a macro to automate creating simple ModuleType instances, and at that point the API would be as complex as it is now, but every module instance would now also have an extra ModuleType subclass – and I don't think that's either simpler or more effective.
Now, importer can simply instantiate this module type, passing __name__, __file__ and the rest. ModuleType.tp_new will perform attribute init, sys.modules registration etc. OR the importer can manually pull tp_new/tp_init/attribute setup, supplanting type_call. (This is closer to the current way of doing things.)
Actual module initialization ("executing the module body") happens in tp_init. reload() is equivalent to calling tp_init again.
Subinterpreter interaction becomes transparent: every interpreter instantiates its own module copy. "Singleton" modules with external global state should fail second instantiation (maybe by deriving from a special SingletonModuleType subclass that will handle it for them).
Current status: every interpreter instantiates its own module instance. "Singleton" modules with external global state are marked as such, and should be written so that they fail second instantiation. (Maybe the failing can be automated by the import machinery, but that part is not yet implemented). It seems to me that adding a custom ModuleType subclass to the mix wouldn't change much.
Additionally, custom module type allows fine grained attribute access control (aka metamodules), useful to many complex modules. C synchronized module "variables" become super-easy to define (tp_members). For lazy loading and importing there's tp_getattro, tp_getset, etc.
Right. This is the kind of thing you *do* need a ModuleType subclass for, and the current API makes it possible to do it.
One problem not solved by this approach (nor the current approach) is module state access from methods of extension types. At least two solutions are possible:
- Look up the module by name (sys.modules) or type (new per-interpreter cache).
- Define a new METH_XXX calling convention (or flag) and pass both PyCFunctionObject.m_self and PyCFunctionObject.m_module to the C level method implementations.
Both can be implemented, #1 being simple and #2 being proper.
*This* is the real problem now. I think #2 is viable, and I'm slowly (too slowly perhaps) working on it
What do you think?
I personally think that your suggestion wouldn't make the API substantially simpler, assuming you would keep it as robust and easy to use as the current solution. And if you would want to maintain backwards compatibility (even with only the pre-PEP 489 state), it would be even harder.
Many people thought about the current APIs, and (almost) all of the unpleasant decisions we had to make do have their reasons. (And if you ask about a more specific decision, I can give you the specific reasons.)