Re: [Python-ideas] Module lifecycle: simple alternative to PEP 3121/PEP 489

On Thu, Apr 14, 2016 at 2:57 PM, Petr Viktorin <encukou@gmail.com> wrote:
I agree that C API is quite difficult and dangerous to use directly (that's why I always use Cython), but I don't agree that separate module init API makes things simpler. PyModuleDef appears to be PyTypeObject surrogate with similar (or identical?) semantics. That's an extra concept to learn about, an extra bit of documentation to consult when writing new code. PyTypeObject, on the other hand, is fundamental and unavoidable, regardless of module init system. Practically speaking, PyModuleDef_Init(&spam_def) and PyType_Ready(&spam_type) differ only in the number of zeroes in their respective static structs. There's also PyType_FromSpec (stable ABI), which might appeal to some people more than PyType_Ready.
I didn't even know that python -m supported extension modules.
ModuleType subclasses have tp_methods. A little check in PyType_Ready can make it behave like PyModuleDef.m_methods. Modules don't need normal methods anyway.
One or two standard macros that help with ModuleType subsclassing are in order. Something like PyModule_HEAD_INIT, whose usage is already obvious from the name.
Regarding backward compatibility, PEP 3121 and most of PEP 489 will work just fine as a facade to ModuleType subclass. I understand that existing PEPs are well researched and there's no real incentive to change. My proposal is more of a hypothetical kind.

Nikita Nemkin <nikita@...> writes:
I understand that existing PEPs are well researched and there's no real incentive to change. My proposal is more of a hypothetical kind.
Petr obviously has researched all this carefully, but there is an incentive: _decimal for example takes a speed hit of 20% with PEP-3121, so it's not implemented. I suspect that the later PEP also slows down modules (which does not matter most of the time). Any new proposal should absolutely include the performance issue. Stefan Krah

On 14 April 2016 at 23:51, Nikita Nemkin <nikita@nemkin.ru> wrote:
The internal details of PyTypeObject are eminently avoidable, either by not defining your own custom types in C code at all (you can get a long way with C level acceleration just by defining functions and using instances of existing types), or else by only defining them dynamically as heap types (the kind created by class statements) via PyType_FromSpec: https://www.python.org/dev/peps/pep-0384/#type-objects To answer your original question, though, PEP 489 needs to read in the context of PEP 451, which was the one that switched the overall import system over to the multi-phase import model: https://www.python.org/dev/peps/pep-0451/ One of the main goals of importlib in general is to let the interpreter do more of the heavy lifting for things that absolutely have to be done correctly if you want your import hook or module to behave "normally". With Python level modules, the interpreter has always taken care of creating the module for "normal" imports, with the module author only having to care about populating that namespace with content (by running Python code). PEP 451 extended that same convenience to authors of module loaders, as they could now just define a custom exec_module, and use the default module creation code rather than having to write their own as part of a load_module implementation. PEP 489 then brought that capability to extension modules: extension module authors can now decide not to worry about module creation at all, and instead just use the Exec hook to populate the standard module object that CPython provides by default. That means caring about the module creation step in an extension module is now primarily a matter of performance optimisation for access to module global state - as Stefan notes, the indirection mechanism in PEP 3121 can be significantly slower than using C level static variables, and indirection through a Python level namespace is likely to be even slower. However, even in those cases, the PEP 489 mechanism gives the extension module reliable access to information it didn't previously have access to, since the create method receives a fully populated module spec. Once the question is narrowed down to "How can an extension module fully support subinterpreters and multiple Py_Initialize/Finalize cycles without incurring PEP 3121's performance overhead?" then the short answer becomes "We don't know, but ideas for that are certainly welcome, either here or over on import-sig". Returning custom module subclasses from the Create hook is certainly one mechanism for that (it's why supporting such subclasses was a design goal for PEP 489), but like other current solutions, they run afoul of the problem that methods defined in C extension modules currently don't receive a reference to the defining module, only to the class instance (which is a general problem with the way methods are defined in C rather than a problem with the import system specifically). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thanks for your input. I now see how things evolved to the present state. in the context of PEP 451, my proposal would have been to move all default module creation tasks to ModuleType.tp_new (taking an optional spec parameter), making separate create and exec unnecessary. Too late, I guess.
I mentioned the way to avoid state access overhead in my first post. It's independent of module loading mechanism: 1) define a new "calling convention" flag like METH_GLOBALS. 2) store module ref in PyCFunctionObject.m_module (currently it stores only the module name) 3) pass module ref as an extra arg to methods with METH_GLOBALS flag. 4) PyModule_State, reimplemented as a macro, would amount to one indirection from the passed parameter. I suspect that most C ABIs allow to pass the extra arg unconditionally, (this is certainly the case for x86 and x64 on Windows and Linux). Meaning that METH_GLOBALS won't increase the actual number of possible dispatch targets in PyCFunction_Call and won't impact Python-to-C call performance at all.

Nikita Nemkin <nikita@...> writes:
I mentioned the way to avoid state access overhead in my first post. It's independent of module loading mechanism:
It's great to see people discussing this. I must clarify the 20% slowdown figure that I posted earlier: The slowdown was due to changing the static variables to module state *and* the static types to heap types. It was recommended at the time to do both or nothing. I haven't measured the module state impact in isolation. Stefan Krah

On 04/15/2016 10:59 AM, Nikita Nemkin wrote:
Wouldn't that break backwards compatibility, though?
My planned approach is a bit more flexible: - Add a reference to the module (ht_module) to heap types - Create a calling convention METH_METHOD, where methods are passed the class that defines the method (which might PyTYPE(self) or a superclass of it) This way methods can get both module state and the class they are defined on, and the replacement for PyModule_State is two indirections. Still, both approaches won't work with slot methods (e.g. nb_add), where there's no space in the API to add an extra argument. Nick proposed a solution in import-sig [0], which is workable but not elegant. But, I think: - METH_METHOD would be useful even if it doesn't solve the problem with slot methods. - A good solution to the slot methods problem is unlikely to render METH_METHOD obsolete. so perhaps solving the 90% case first would be OK. [0] https://mail.python.org/pipermail/import-sig/2015-July/001035.html

On Fri, Apr 15, 2016 at 2:24 PM, Petr Viktorin <encukou@gmail.com> wrote:
It will, and I consider this level of breakage acceptable. Alternatively, another field can be added to this struct.
I've read the linked import-sig thread and realized the depth of issues involved... Those poor heap type methods don't even have access their own type pointer! In the light of that, your variant is more useful than mine. Still, without a good slot support option, new METH_X conventions don't look attractive at all. Such fundamental change, but only solves half of the problem. Also, MRO walking is actually not as slow as it seems. Checking Py_TYPE(self) and Py_TYPE(self)->tp_base *inline* will minimize performance overhead in the (very) common case. If non-slot methods had a suitable static anchor (the equivalent of slot function address for slots), they could use MRO walking too.

Let's move the discussion to import-sig, as Nick explained in the other subthread. Please drop python-ideas from CC when you reply. On 04/15/2016 12:58 PM, Nikita Nemkin wrote:
Well, it solves the problem for methods that have calling conventions, and I'm pretty sure by now that a full solution will need this *plus* another solution for slots. So I'm looking at the problems as two separate parts, and I also think that when it comes to writing PEPs, having two separate PEPs would make this more understandable.
I think so as well. This would mean that module state access in named methods is fast; in slot methods it's possible (and usually fast *enough*), and the full solution with __typeslots__ would still be possible.
If non-slot methods had a suitable static anchor (the equivalent of slot function address for slots), they could use MRO walking too.
I think a new METH_* calling style and explicit pointers is a better alternative here.

On 15 April 2016 at 18:59, Nikita Nemkin <nikita@nemkin.ru> wrote:
That doesn't work either, as not only aren't modules in general actually required to be instances of ModuleType (see [1]), we also need to be able to create modules to hold __main__, os, sys and _frozen_importlib before we have an import system to manipulate. That's a large part of the reason we hived off import-sig from python-ideas a while back - the import system involves a whole lot of intertwined arcana stemming from accidents-of-implementation early in Python's history, as well as the flexible import hook system that was defined in PEP 302, so a separate list has proven useful for thrashing out technical details, while we tend to use python-dev and python-ideas more to check the end result is still comprehensible to folks that aren't familiar with all those internals :) Cheers, Nick. [1] https://www.python.org/dev/peps/pep-0489/#the-py-mod-create-slot -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nikita Nemkin <nikita@...> writes:
I understand that existing PEPs are well researched and there's no real incentive to change. My proposal is more of a hypothetical kind.
Petr obviously has researched all this carefully, but there is an incentive: _decimal for example takes a speed hit of 20% with PEP-3121, so it's not implemented. I suspect that the later PEP also slows down modules (which does not matter most of the time). Any new proposal should absolutely include the performance issue. Stefan Krah

On 14 April 2016 at 23:51, Nikita Nemkin <nikita@nemkin.ru> wrote:
The internal details of PyTypeObject are eminently avoidable, either by not defining your own custom types in C code at all (you can get a long way with C level acceleration just by defining functions and using instances of existing types), or else by only defining them dynamically as heap types (the kind created by class statements) via PyType_FromSpec: https://www.python.org/dev/peps/pep-0384/#type-objects To answer your original question, though, PEP 489 needs to read in the context of PEP 451, which was the one that switched the overall import system over to the multi-phase import model: https://www.python.org/dev/peps/pep-0451/ One of the main goals of importlib in general is to let the interpreter do more of the heavy lifting for things that absolutely have to be done correctly if you want your import hook or module to behave "normally". With Python level modules, the interpreter has always taken care of creating the module for "normal" imports, with the module author only having to care about populating that namespace with content (by running Python code). PEP 451 extended that same convenience to authors of module loaders, as they could now just define a custom exec_module, and use the default module creation code rather than having to write their own as part of a load_module implementation. PEP 489 then brought that capability to extension modules: extension module authors can now decide not to worry about module creation at all, and instead just use the Exec hook to populate the standard module object that CPython provides by default. That means caring about the module creation step in an extension module is now primarily a matter of performance optimisation for access to module global state - as Stefan notes, the indirection mechanism in PEP 3121 can be significantly slower than using C level static variables, and indirection through a Python level namespace is likely to be even slower. However, even in those cases, the PEP 489 mechanism gives the extension module reliable access to information it didn't previously have access to, since the create method receives a fully populated module spec. Once the question is narrowed down to "How can an extension module fully support subinterpreters and multiple Py_Initialize/Finalize cycles without incurring PEP 3121's performance overhead?" then the short answer becomes "We don't know, but ideas for that are certainly welcome, either here or over on import-sig". Returning custom module subclasses from the Create hook is certainly one mechanism for that (it's why supporting such subclasses was a design goal for PEP 489), but like other current solutions, they run afoul of the problem that methods defined in C extension modules currently don't receive a reference to the defining module, only to the class instance (which is a general problem with the way methods are defined in C rather than a problem with the import system specifically). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thanks for your input. I now see how things evolved to the present state. in the context of PEP 451, my proposal would have been to move all default module creation tasks to ModuleType.tp_new (taking an optional spec parameter), making separate create and exec unnecessary. Too late, I guess.
I mentioned the way to avoid state access overhead in my first post. It's independent of module loading mechanism: 1) define a new "calling convention" flag like METH_GLOBALS. 2) store module ref in PyCFunctionObject.m_module (currently it stores only the module name) 3) pass module ref as an extra arg to methods with METH_GLOBALS flag. 4) PyModule_State, reimplemented as a macro, would amount to one indirection from the passed parameter. I suspect that most C ABIs allow to pass the extra arg unconditionally, (this is certainly the case for x86 and x64 on Windows and Linux). Meaning that METH_GLOBALS won't increase the actual number of possible dispatch targets in PyCFunction_Call and won't impact Python-to-C call performance at all.

Nikita Nemkin <nikita@...> writes:
I mentioned the way to avoid state access overhead in my first post. It's independent of module loading mechanism:
It's great to see people discussing this. I must clarify the 20% slowdown figure that I posted earlier: The slowdown was due to changing the static variables to module state *and* the static types to heap types. It was recommended at the time to do both or nothing. I haven't measured the module state impact in isolation. Stefan Krah

On 04/15/2016 10:59 AM, Nikita Nemkin wrote:
Wouldn't that break backwards compatibility, though?
My planned approach is a bit more flexible: - Add a reference to the module (ht_module) to heap types - Create a calling convention METH_METHOD, where methods are passed the class that defines the method (which might PyTYPE(self) or a superclass of it) This way methods can get both module state and the class they are defined on, and the replacement for PyModule_State is two indirections. Still, both approaches won't work with slot methods (e.g. nb_add), where there's no space in the API to add an extra argument. Nick proposed a solution in import-sig [0], which is workable but not elegant. But, I think: - METH_METHOD would be useful even if it doesn't solve the problem with slot methods. - A good solution to the slot methods problem is unlikely to render METH_METHOD obsolete. so perhaps solving the 90% case first would be OK. [0] https://mail.python.org/pipermail/import-sig/2015-July/001035.html

On Fri, Apr 15, 2016 at 2:24 PM, Petr Viktorin <encukou@gmail.com> wrote:
It will, and I consider this level of breakage acceptable. Alternatively, another field can be added to this struct.
I've read the linked import-sig thread and realized the depth of issues involved... Those poor heap type methods don't even have access their own type pointer! In the light of that, your variant is more useful than mine. Still, without a good slot support option, new METH_X conventions don't look attractive at all. Such fundamental change, but only solves half of the problem. Also, MRO walking is actually not as slow as it seems. Checking Py_TYPE(self) and Py_TYPE(self)->tp_base *inline* will minimize performance overhead in the (very) common case. If non-slot methods had a suitable static anchor (the equivalent of slot function address for slots), they could use MRO walking too.

Let's move the discussion to import-sig, as Nick explained in the other subthread. Please drop python-ideas from CC when you reply. On 04/15/2016 12:58 PM, Nikita Nemkin wrote:
Well, it solves the problem for methods that have calling conventions, and I'm pretty sure by now that a full solution will need this *plus* another solution for slots. So I'm looking at the problems as two separate parts, and I also think that when it comes to writing PEPs, having two separate PEPs would make this more understandable.
I think so as well. This would mean that module state access in named methods is fast; in slot methods it's possible (and usually fast *enough*), and the full solution with __typeslots__ would still be possible.
If non-slot methods had a suitable static anchor (the equivalent of slot function address for slots), they could use MRO walking too.
I think a new METH_* calling style and explicit pointers is a better alternative here.

On 15 April 2016 at 18:59, Nikita Nemkin <nikita@nemkin.ru> wrote:
That doesn't work either, as not only aren't modules in general actually required to be instances of ModuleType (see [1]), we also need to be able to create modules to hold __main__, os, sys and _frozen_importlib before we have an import system to manipulate. That's a large part of the reason we hived off import-sig from python-ideas a while back - the import system involves a whole lot of intertwined arcana stemming from accidents-of-implementation early in Python's history, as well as the flexible import hook system that was defined in PEP 302, so a separate list has proven useful for thrashing out technical details, while we tend to use python-dev and python-ideas more to check the end result is still comprehensible to folks that aren't familiar with all those internals :) Cheers, Nick. [1] https://www.python.org/dev/peps/pep-0489/#the-py-mod-create-slot -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (4)
-
Nick Coghlan
-
Nikita Nemkin
-
Petr Viktorin
-
Stefan Krah