Make extension module initialisation more like Python module initialisation
Hi, I suspect that this will be put into a proper PEP at some point, but I'd like to bring this up for discussion first. This came out of issues 13429 and 16392. http://bugs.python.org/issue13429 http://bugs.python.org/issue16392 Stefan The problem =========== Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. This means that it knows neither the __file__ it is being loaded from nor its package (i.e. its FQMN). This hinders relative imports and resource loading. In Py3, it's also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. And without the FQMN, it's not trivial to correctly add the module to sys.modules either. We specifically run into this for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of a FQMN and correct file path hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time. The proposal ============ I propose to split the extension module initialisation into two steps in Python 3.4, in a backwards compatible way. Step 1: The current module init function can be reduced to just creating the module instance and returning it (and potentially doing some simple C level setup). Optionally, after creating the module (and this is the new part), the module init code can register a C callback function that will be called after setting up the module. Step 2: The shared library importer receives the module instance from the module init function, adds __file__, __path__, __package__ and friends to the module dict, and then checks for the callback. If non-NULL, it calls it to continue the module initialisation by user code. The callback ============ The callback is defined as follows:: int (*PyModule_init_callback)(PyObject* the_module, PyModuleInitContext* context) "PyModuleInitContext" is a struct that is meant mostly for making the callback more future proof by allowing additional parameters to be passed in. For now, I can see a use case for the following fields:: struct PyModuleInitContext { char* module_name; char* qualified_module_name; } Both names are encoded in UTF-8. As for the file path, I consider it best to retrieve it from the module's __file__ attribute as a Python string object to reduce filename encoding problems. Note that this struct argument is not strictly required, but given that this proposal would have been much simpler if the module init function had accepted such an argument in the first place, I consider it a good idea not to let this chance pass by again. The registration of the callback uses a new C-API function: int PyModule_SetInitFunction(PyObject* module, PyModule_init_callback callback) The function name uses "Set" instead of "Register" to make it clear that there is only one such function per module. An alternative would be a new module creation function "PyModule_Create3()" that takes the callback as third argument, in addition to what "PyModule_Create2()" accepts. This would require users to explicitly pass in the (second) version argument, which might be considered only a minor issue. Implementation ============== The implementation requires local changes to the extension module importer and a new C-API function. In order to store the callback, it should use a new field in the module object struct. Open questions ============== It is not clear how extensions should be handled that register more than one module in their module init function, e.g. compiled packages. One possibility would be to leave the setup to the user, who would have to know all FQMNs anyway in this case, although not the import file path. Alternatively, the import machinery could use a stack to remember for which modules a callback was registered during the last init function call, set up all of them and then call their callbacks. It's not clear if this meets the intention of the user. Alternatives ============ 1) It would be possible to make extension modules optionally export another symbol, e.g. "PyInit2_modulename", that the shared library loader would call in addition to the required function "PyInit_modulename". This would remove the need for a new API that registers the above callback. The drawback is that it also makes it easier to write broken code because a Python version or implementation that does not support this second symbol would simply not call it, without error. The new C-API function would let the build fail instead if it is not supported. 2) The callback could be made available as a Python function in the module dict, thus also removing the need for an explicit registration API. However, this approach would add overhead to both sides, the importer code and the user provided module init code, as it would require additional dictionary handling and the implementation of a one-time Python function in user code. It would also suffer from the problem that missing support in the runtime would pass silently. 3) The callback could be registered statically in the PyModuleDef struct by adding a new field. This is not trivial to do in a backwards compatible way because the struct would grow longer without explicit initialisation by existing user code. Extending PyModuleDef_HEAD_INIT might be possible but would still break at least binary compatibility. 4) Pass a new context argument into the module init function that contains all information necessary to properly and completely set up the module at creation time. This would provide a much simpler and cleaner solution than the proposed solution. However, it will not be possible before Python 4 as it breaks backwards compatibility with all existing extension modules at both the source and binary level.
On 08.11.2012 13:47, Stefan Behnel wrote:
Hi,
I suspect that this will be put into a proper PEP at some point, but I'd like to bring this up for discussion first. This came out of issues 13429 and 16392.
http://bugs.python.org/issue13429
http://bugs.python.org/issue16392
Stefan
The problem ===========
Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. This means that it knows neither the __file__ it is being loaded from nor its package (i.e. its FQMN). This hinders relative imports and resource loading. In Py3, it's also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. And without the FQMN, it's not trivial to correctly add the module to sys.modules either.
We specifically run into this for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of a FQMN and correct file path hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time.
The proposal ============
... [callbacks] ...
Alternatives ============ ... 3) The callback could be registered statically in the PyModuleDef struct by adding a new field. This is not trivial to do in a backwards compatible way because the struct would grow longer without explicit initialisation by existing user code. Extending PyModuleDef_HEAD_INIT might be possible but would still break at least binary compatibility.
I think the above is the cleaner approach than the callback mechanism. There's no problem in adding new slots to the end of the PyModuleDef struct - we've been doing that for years in many other structs :-) All you have to do is bump the Python API version number. (Martin's PEP http://www.python.org/dev/peps/pep-3121/ has the details) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 08 2012)
Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
M.-A. Lemburg, 08.11.2012 14:01:
On 08.11.2012 13:47, Stefan Behnel wrote:
I suspect that this will be put into a proper PEP at some point, but I'd like to bring this up for discussion first. This came out of issues 13429 and 16392.
http://bugs.python.org/issue13429
http://bugs.python.org/issue16392
Stefan
The problem ===========
Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. This means that it knows neither the __file__ it is being loaded from nor its package (i.e. its FQMN). This hinders relative imports and resource loading. In Py3, it's also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. And without the FQMN, it's not trivial to correctly add the module to sys.modules either.
We specifically run into this for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of a FQMN and correct file path hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time.
The proposal ============
... [callbacks] ...
Alternatives ============ ... 3) The callback could be registered statically in the PyModuleDef struct by adding a new field. This is not trivial to do in a backwards compatible way because the struct would grow longer without explicit initialisation by existing user code. Extending PyModuleDef_HEAD_INIT might be possible but would still break at least binary compatibility.
I think the above is the cleaner approach than the callback mechanism.
Oh, definitely.
There's no problem in adding new slots to the end of the PyModuleDef struct - we've been doing that for years in many other structs :-)
All you have to do is bump the Python API version number.
(Martin's PEP http://www.python.org/dev/peps/pep-3121/ has the details)
The difference is that this specific struct is provided by user code and (typically) initialised statically. There is no guarantee that user code that does not expect the additional field will initialise it to 0. Failing that, I don't see how we could trust its value in any way. Stefan
Stefan Behnel, 08.11.2012 14:20:
M.-A. Lemburg, 08.11.2012 14:01:
On 08.11.2012 13:47, Stefan Behnel wrote:
I suspect that this will be put into a proper PEP at some point, but I'd like to bring this up for discussion first. This came out of issues 13429 and 16392.
http://bugs.python.org/issue13429
http://bugs.python.org/issue16392
Stefan
The problem ===========
Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. This means that it knows neither the __file__ it is being loaded from nor its package (i.e. its FQMN). This hinders relative imports and resource loading. In Py3, it's also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. And without the FQMN, it's not trivial to correctly add the module to sys.modules either.
We specifically run into this for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of a FQMN and correct file path hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time.
The proposal ============
... [callbacks] ...
Alternatives ============ ... 3) The callback could be registered statically in the PyModuleDef struct by adding a new field. This is not trivial to do in a backwards compatible way because the struct would grow longer without explicit initialisation by existing user code. Extending PyModuleDef_HEAD_INIT might be possible but would still break at least binary compatibility.
I think the above is the cleaner approach than the callback mechanism.
Oh, definitely.
There's no problem in adding new slots to the end of the PyModuleDef struct - we've been doing that for years in many other structs :-)
All you have to do is bump the Python API version number.
(Martin's PEP http://www.python.org/dev/peps/pep-3121/ has the details)
The difference is that this specific struct is provided by user code and (typically) initialised statically. There is no guarantee that user code that does not expect the additional field will initialise it to 0. Failing that, I don't see how we could trust its value in any way.
Hmm - you're actually right. In C, uninitialised fields in a static struct are set to 0 automatically. Same case as the type structs. That makes your objection perfectly valid. I'll rewrite and shorten the proposal. Thanks! Stefan
Hi, here's an updated proposal, adopting Marc-Andre's improvement that uses a new field in the PyModuleDef struct to register the callback. Note that this change no longer keeps up binary compatibility, which may or may not be acceptable for Python 3.4. Stefan The problem =========== Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. This means that it knows neither the __file__ it is being loaded from nor its package (i.e. its FQMN). This hinders relative imports and resource loading. In Py3, it's also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. And without the FQMN, it's not trivial to correctly add the module to sys.modules either. We specifically run into this for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of a FQMN and correct file path hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time. The proposal ============ I propose to split the extension module initialisation into two steps in Python 3.4, in a backwards compatible way. Step 1: The current module init function can be reduced to just creating the module instance and returning it (and potentially doing some simple C level setup). Additionally, and this is the new part, the module init code can register a C callback function in its PyModuleDef struct that will be called after setting up the module. Step 2: The shared library importer receives the module instance from the module init function, adds __file__, __path__, __package__ and friends to the module dict, and then checks for the callback. If non-NULL, it calls it to continue the module initialisation by user code. The callback ============ The callback is defined as follows:: int (*PyModule_init_callback)(PyObject* the_module, PyModuleInitContext* context) "PyModuleInitContext" is a struct that is meant mostly for making the callback more future proof by allowing additional parameters to be passed in. For now, I can see a use case for the following fields:: struct PyModuleInitContext { char* module_name; char* qualified_module_name; } Both names are encoded in UTF-8. As for the file path, I consider it best to retrieve it from the module's __file__ attribute as a Python string object to reduce filename encoding problems. Note that this struct argument is not strictly required (it could be a simple "inquiry" function), but given that this proposal would have been much simpler if the module init function had accepted such an argument in the first place, I consider it a good idea not to let this chance pass by again. The counter arguments would be "keep it simple" and "we already pass in the whole module (and its dict) anyway". Up for debate! The registration of the callback uses a new field "m_init" in the PyModuleDef struct:: typedef struct PyModuleDef{ PyModuleDef_Base m_base; const char* m_name; const char* m_doc; Py_ssize_t m_size; PyMethodDef *m_methods; inquiry m_reload; traverseproc m_traverse; inquiry m_clear; freefunc m_free; /* --- original fields up to here */ PyModule_init_callback m_init; /* post-setup init callback */ } PyModuleDef; Implementation ============== The implementation requires local changes to the extension module importer and a new field in the PyModuleDef struct. Open questions ============== It is not clear how extensions should be handled that register more than one module in their module init function, e.g. compiled packages. One possibility would be to leave the setup to the user, who would have to know all FQMNs anyway in this case, although not the import file path. Alternatively, the import machinery could use a stack to remember for which modules a callback was registered during the last init function call, set up all of them and then call their callbacks. It's not clear if this meets the intention of the user. It's not guaranteed that all of these modules will be related to the module that registered them, in the sense that they should receive the same setup. The best way to fix this correctly might be to make users pass the setup explicitly into the module creation functions in Python 4 (see alternatives below), so that the setup and sys.modules registration can happen directly at this point. Alternatives ============ 1) It would be possible to make extension modules optionally export another symbol, e.g. "PyInit2_modulename", that the shared library loader would call in addition to the required function "PyInit_modulename". This would keep up binary compatibility. The drawback is that it also makes it easier to write broken code because a Python version or implementation that does not support this second symbol would simply not call it, without error. The new struct field would let the build fail instead if it is not supported. 2) The callback could be made available as a Python function in the module dict, thus also removing the need for an explicit registration API. However, this approach would add overhead to both sides, the importer code and the user provided module init code, as it would require additional dictionary handling and the implementation of a one-time Python function in user code. It would also suffer from the problem that missing support in the runtime would pass silently. 3) The original proposal used a new C-API function to register the callback explicitly, as opposed to extending the PyModuleDef struct. This has the advantage of keeping up binary compatibility with existing Py3.3 extensions. It has the disadvantage of adding another indirection to the setup procedure where a static function pointer would suffice. 4) Pass a new context argument into the module init function that contains all information necessary to properly and completely set up the module at creation time. This would provide a much simpler and cleaner solution than the proposed solution. However, it will not be possible before Python 4 as it breaks backwards compatibility with all existing extension modules at both the source and binary level.
On Fri, Nov 9, 2012 at 12:32 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
here's an updated proposal, adopting Marc-Andre's improvement that uses a new field in the PyModuleDef struct to register the callback. Note that this change no longer keeps up binary compatibility, which may or may not be acceptable for Python 3.4.
It's not acceptable, as PyModuleDef is part of PEP 384's stable ABI. All such public structures are locked at their original size. 3) The original proposal used a new C-API function to register the callback
explicitly, as opposed to extending the PyModuleDef struct. This has the advantage of keeping up binary compatibility with existing Py3.3 extensions. It has the disadvantage of adding another indirection to the setup procedure where a static function pointer would suffice.
Module initialisation is (and must be) part of the stable ABI. Indirection (especially through Python) is a *good* thing, as, ideally, any new interfaces should be defined in a way that doesn't increase the maintenance burden for the stable ABI. I don't agree that the use of a new init API can fail silently, so long as it completely *replaces* the old API, rather than being an addition. That way, since you won't be defining the *old* init function at all, old versions will correctly refuse to load your module. So I propose that we simply *fix* extension module loading to work the same way as everything else: the loader creates the module object, and passes it in to a new init function to be fully populated. __file__ and __name__ would be passed in as preinitialised module attributes. The existing PyModule_Create functions would be complemented by a PyModule_SetDef function which allowed a PyModuleDef to be configured on a pre-existing module. Extension modules that wanted to create multiple Python modules would still be free to do so - it would just be up to the extension initialisation code to call PyModule_Create to construct them and set __file__ based on the __file__ of the passed in module. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, Nov 8, 2012 at 7:47 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Hi,
I suspect that this will be put into a proper PEP at some point, but I'd like to bring this up for discussion first. This came out of issues 13429 and 16392.
http://bugs.python.org/issue13429
http://bugs.python.org/issue16392
Stefan
The problem ===========
Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. This means that it knows neither the __file__ it is being loaded from nor its package (i.e. its FQMN). This hinders relative imports and resource loading. In Py3, it's also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. And without the FQMN, it's not trivial to correctly add the module to sys.modules either.
We specifically run into this for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of a FQMN and correct file path hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time.
Or to put it another way, importlib doesn't give you a nice class to inherit from which will handle all of the little details of creating a blank module (or fetching from sys.modules if you are reloading), setting __file__, __cached__, __package__, __name__, __loader__, and (optionally) __path__ for you, and then cleaning up if something goes wrong. It's a pain to do all of this yourself and to get all the details right (i.e. there's a reason that @importlib.util.module_for_loader exists).
The proposal ============
I propose to split the extension module initialisation into two steps in Python 3.4, in a backwards compatible way.
Step 1: The current module init function can be reduced to just creating the module instance and returning it (and potentially doing some simple C level setup). Optionally, after creating the module (and this is the new part), the module init code can register a C callback function that will be called after setting up the module.
Why even bother with the module creation? Why can't Python do that as well and then call the callback?
Step 2: The shared library importer receives the module instance from the module init function, adds __file__, __path__, __package__ and friends to the module dict, and then checks for the callback. If non-NULL, it calls it to continue the module initialisation by user code.
The callback ============
The callback is defined as follows::
int (*PyModule_init_callback)(PyObject* the_module, PyModuleInitContext* context)
"PyModuleInitContext" is a struct that is meant mostly for making the callback more future proof by allowing additional parameters to be passed in. For now, I can see a use case for the following fields::
struct PyModuleInitContext { char* module_name; char* qualified_module_name; }
Both names are encoded in UTF-8. As for the file path, I consider it best to retrieve it from the module's __file__ attribute as a Python string object to reduce filename encoding problems.
Note that this struct argument is not strictly required, but given that this proposal would have been much simpler if the module init function had accepted such an argument in the first place, I consider it a good idea not to let this chance pass by again.
The registration of the callback uses a new C-API function:
int PyModule_SetInitFunction(PyObject* module, PyModule_init_callback callback)
The function name uses "Set" instead of "Register" to make it clear that there is only one such function per module.
An alternative would be a new module creation function "PyModule_Create3()" that takes the callback as third argument, in addition to what "PyModule_Create2()" accepts. This would require users to explicitly pass in the (second) version argument, which might be considered only a minor issue.
Implementation ==============
The implementation requires local changes to the extension module importer and a new C-API function. In order to store the callback, it should use a new field in the module object struct.
Open questions ==============
It is not clear how extensions should be handled that register more than one module in their module init function, e.g. compiled packages. One possibility would be to leave the setup to the user, who would have to know all FQMNs anyway in this case, although not the import file path. Alternatively, the import machinery could use a stack to remember for which modules a callback was registered during the last init function call, set up all of them and then call their callbacks. It's not clear if this meets the intention of the user.
Alternatives ============
1) It would be possible to make extension modules optionally export another symbol, e.g. "PyInit2_modulename", that the shared library loader would call in addition to the required function "PyInit_modulename". This would remove the need for a new API that registers the above callback. The drawback is that it also makes it easier to write broken code because a Python version or implementation that does not support this second symbol would simply not call it, without error. The new C-API function would let the build fail instead if it is not supported.
An alternative to the alternative is that if the PyInit2 function exists it's called instead of the the PyInit function, and then the PyInit function is nothing more than a single line function call (or whatever the absolute bare minimum is) into some helper that calls the PyInit2 call properly for backwards ABI compatibility (i.e. passes in whatever details are lost by the indirection in function call). That provides an eventual upgrade path of dropping PyInit and moving over to PyInit2. -Brett
2) The callback could be made available as a Python function in the module dict, thus also removing the need for an explicit registration API. However, this approach would add overhead to both sides, the importer code and the user provided module init code, as it would require additional dictionary handling and the implementation of a one-time Python function in user code. It would also suffer from the problem that missing support in the runtime would pass silently.
3) The callback could be registered statically in the PyModuleDef struct by adding a new field. This is not trivial to do in a backwards compatible way because the struct would grow longer without explicit initialisation by existing user code. Extending PyModuleDef_HEAD_INIT might be possible but would still break at least binary compatibility.
4) Pass a new context argument into the module init function that contains all information necessary to properly and completely set up the module at creation time. This would provide a much simpler and cleaner solution than the proposed solution. However, it will not be possible before Python 4 as it breaks backwards compatibility with all existing extension modules at both the source and binary level.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
Hi Brett, thanks for the feedback. Brett Cannon, 08.11.2012 15:41:
On Thu, Nov 8, 2012 at 7:47 AM, Stefan Behnel wrote:
I propose to split the extension module initialisation into two steps in Python 3.4, in a backwards compatible way.
Step 1: The current module init function can be reduced to just creating the module instance and returning it (and potentially doing some simple C level setup). Optionally, after creating the module (and this is the new part), the module init code can register a C callback function that will be called after setting up the module.
Why even bother with the module creation? Why can't Python do that as well and then call the callback?
Step 2: The shared library importer receives the module instance from the module init function, adds __file__, __path__, __package__ and friends to the module dict, and then checks for the callback. If non-NULL, it calls it to continue the module initialisation by user code. [...] An alternative to the alternative is that if the PyInit2 function exists it's called instead of the the PyInit function, and then the PyInit function is nothing more than a single line function call (or whatever the absolute bare minimum is) into some helper that calls the PyInit2 call properly for backwards ABI compatibility (i.e. passes in whatever details are lost by the indirection in function call). That provides an eventual upgrade path of dropping PyInit and moving over to PyInit2.
In that case, you'd have to export the PyModuleDef descriptor as well, because that's what tells CPython how the module behaves and what to do with it to set it up properly (e.g. allocate module state space on the heap). In fact, if the module init function became a field in the descriptor, it would be enough (taking backwards compatibility aside) if *only* the descriptor was exported and used by the module loader. With the caveat that this might kill some less common but not necessarily illegitimate use cases that do more than just creating and initialising a single module... Stefan
On Thu, Nov 8, 2012 at 10:00 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Hi Brett,
thanks for the feedback.
On Thu, Nov 8, 2012 at 7:47 AM, Stefan Behnel wrote:
I propose to split the extension module initialisation into two steps in Python 3.4, in a backwards compatible way.
Step 1: The current module init function can be reduced to just creating the module instance and returning it (and potentially doing some simple C level setup). Optionally, after creating the module (and this is the new part), the module init code can register a C callback function that will be called after setting up the module.
Why even bother with the module creation? Why can't Python do that as well and then call the callback?
Step 2: The shared library importer receives the module instance from
module init function, adds __file__, __path__, __package__ and friends to the module dict, and then checks for the callback. If non-NULL, it calls it to continue the module initialisation by user code. [...] An alternative to the alternative is that if the PyInit2 function exists it's called instead of the the PyInit function, and then the PyInit function is nothing more than a single line function call (or whatever
Brett Cannon, 08.11.2012 15:41: the the
absolute bare minimum is) into some helper that calls the PyInit2 call properly for backwards ABI compatibility (i.e. passes in whatever details are lost by the indirection in function call). That provides an eventual upgrade path of dropping PyInit and moving over to PyInit2.
In that case, you'd have to export the PyModuleDef descriptor as well, because that's what tells CPython how the module behaves and what to do with it to set it up properly (e.g. allocate module state space on the heap).
True.
In fact, if the module init function became a field in the descriptor, it would be enough (taking backwards compatibility aside) if *only* the descriptor was exported and used by the module loader.
Also true.
With the caveat that this might kill some less common but not necessarily illegitimate use cases that do more than just creating and initialising a single module...
You mean creating another module in the init function? That's fine, but that should be a call to __import__ anyway and that should handle things properly. Else you are circumventing the import system and you can do everything from scratch. I don't see why this would stop you from doing anything you want, it just simplifies the common case.
Brett Cannon, 08.11.2012 16:06:
On Thu, Nov 8, 2012 at 10:00 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Hi Brett,
thanks for the feedback.
On Thu, Nov 8, 2012 at 7:47 AM, Stefan Behnel wrote:
I propose to split the extension module initialisation into two steps in Python 3.4, in a backwards compatible way.
Step 1: The current module init function can be reduced to just creating the module instance and returning it (and potentially doing some simple C level setup). Optionally, after creating the module (and this is the new part), the module init code can register a C callback function that will be called after setting up the module.
Why even bother with the module creation? Why can't Python do that as well and then call the callback?
Step 2: The shared library importer receives the module instance from
module init function, adds __file__, __path__, __package__ and friends to the module dict, and then checks for the callback. If non-NULL, it calls it to continue the module initialisation by user code. [...] An alternative to the alternative is that if the PyInit2 function exists it's called instead of the the PyInit function, and then the PyInit function is nothing more than a single line function call (or whatever
Brett Cannon, 08.11.2012 15:41: the the
absolute bare minimum is) into some helper that calls the PyInit2 call properly for backwards ABI compatibility (i.e. passes in whatever details are lost by the indirection in function call). That provides an eventual upgrade path of dropping PyInit and moving over to PyInit2.
In that case, you'd have to export the PyModuleDef descriptor as well, because that's what tells CPython how the module behaves and what to do with it to set it up properly (e.g. allocate module state space on the heap).
True.
In fact, if the module init function became a field in the descriptor, it would be enough (taking backwards compatibility aside) if *only* the descriptor was exported and used by the module loader.
Also true.
With the caveat that this might kill some less common but not necessarily illegitimate use cases that do more than just creating and initialising a single module...
You mean creating another module in the init function? That's fine, but that should be a call to __import__ anyway and that should handle things properly.
Ok.
Else you are circumventing the import system and you can do everything from scratch.
I guess I'd be ok with putting that burden on users in this case.
I don't see why this would stop you from doing anything you want, it just simplifies the common case.
The only problematic case I see here would be a module that calculates the size of its state space at init time, e.g. based on some platform specifics or environment parameters, anything from the platform specific size of some data type to the runtime configured number of OpenMP threads. That would make the PyModuleDef a compile time static thing - not sure if that's currently required. Stefan
Hi, let me revive and summarize this old thread. Stefan Behnel, 08.11.2012 13:47:
I suspect that this will be put into a proper PEP at some point, but I'd like to bring this up for discussion first. This came out of issues 13429 and 16392.
http://bugs.python.org/issue13429
http://bugs.python.org/issue16392
The problem ===========
Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. This means that it knows neither the __file__ it is being loaded from nor its package (i.e. its FQMN). This hinders relative imports and resource loading. In Py3, it's also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. And without the FQMN, it's not trivial to correctly add the module to sys.modules either.
We specifically run into this for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of a FQMN and correct file path hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time.
The outcome of this discussion was that the extension module import protocol needs to change in order to provide all necessary information to the module init function. Brett Cannon proposed to move the module object creation into the extension module importer, i.e. outside of the user provided module init function. CPython would then load the extension module, create and initialise the module object (set __file__, __name__, etc.) and pass it into the module init function. I proposed to make the PyModuleDef struct the new entry point instead of just a generic C function, as that would give the module importer all necessary information about the module to create the module object. The only missing bit is the entry point for the new module init function. Nick Coghlan objected to the proposal of simply extending PyModuleDef with an initialiser function, as the struct is part of the stable ABI. Alternatives I see: 1) Expose a struct that points to the extension module's PyModuleDef struct and the init function and expose that struct instead. 2) Expose both the PyModuleDef and the init function as public symbols. 3) Provide a public C function as entry point that returns both a PyModuleDef pointer and a module init function pointer. 4) Change the m_init function pointer in PyModuleDef_base from func(void) to func(PyObject*) iff the PyModuleDef struct is exposed as a public symbol. 5) Duplicate PyModuleDef and adapt the new one as in 4). Alternatives 1) and 2) only differ marginally by the number of public symbols being exposed. 3) has the advantage of supporting more advanced setups, e.g. heap allocation for the PyModuleDef struct. 4) is a hack and has the disadvantage that the signature of the module init function cannot be stored across reinitialisations (PyModuleDef has no "flags" or "state" field to remember it). 5) would fix that, i.e. we could add a proper pointer to the new module init function as well as a flags field for future extensions. A similar effect could be achieved by carefully designing the struct in 1). I think 1-3 are all reasonable ways to do this, although I don't think 3) will be necessary. 5) would be a clean fix, but has the disadvantage of duplicating an entire struct just to change one field in it. I'm currently leaning towards 1), with a struct that points to PyModuleDef, module init function and a flags field for future extensions. I understand that this would need to become part of the stable ABI, so explicit extensibility is important to keep up backwards compatibility. Opinions? Stefan
On 6 August 2013 15:02, Stefan Behnel <stefan_ml@behnel.de> wrote:
Alternatives I see:
1) Expose a struct that points to the extension module's PyModuleDef struct and the init function and expose that struct instead.
2) Expose both the PyModuleDef and the init function as public symbols.
3) Provide a public C function as entry point that returns both a PyModuleDef pointer and a module init function pointer.
4) Change the m_init function pointer in PyModuleDef_base from func(void) to func(PyObject*) iff the PyModuleDef struct is exposed as a public symbol.
5) Duplicate PyModuleDef and adapt the new one as in 4).
Alternatives 1) and 2) only differ marginally by the number of public symbols being exposed. 3) has the advantage of supporting more advanced setups, e.g. heap allocation for the PyModuleDef struct. 4) is a hack and has the disadvantage that the signature of the module init function cannot be stored across reinitialisations (PyModuleDef has no "flags" or "state" field to remember it). 5) would fix that, i.e. we could add a proper pointer to the new module init function as well as a flags field for future extensions. A similar effect could be achieved by carefully designing the struct in 1).
I think 1-3 are all reasonable ways to do this, although I don't think 3) will be necessary. 5) would be a clean fix, but has the disadvantage of duplicating an entire struct just to change one field in it.
I'm currently leaning towards 1), with a struct that points to PyModuleDef, module init function and a flags field for future extensions. I understand that this would need to become part of the stable ABI, so explicit extensibility is important to keep up backwards compatibility.
Opinions?
I believe a better option would be to migrate module creation over to a dynamic PyModule_Slot and PyModule_Spec approach in the stable ABI, similar to the one that was defined for types in PEP 384. A related topic is that over on import-sig, we're currently tinkering with the idea of changing the way *Python* module imports happen to include a separate "ImportSpec" object (exact name TBC). The spec would contain preliminary info on all of the things that the import system can figure out *without* actually importing the module. That list includes all the special attributes that are currently set on modules: __loader__ __name__ __package__ __path__ __file__ __cached__ (Note that the attributes on the spec *may not* be the same as those in the module's own namespace - for example, __name__ and __spec__.name would differ in a module executed with -m, and __path__ and __spec__.path would end up differing in packages that directly manipulated their __path__ attribute during __init__ execution) The intent is to clean up some of the ad hoc hackery that was needed to make PEP 420 work, and reduce the amount of duplicated functionality needed in loader implementations. If you wanted to reboot this thread on import-sig, that would probably be a good thing :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan, 06.08.2013 07:35:
If you wanted to reboot this thread on import-sig, that would probably be a good thing :)
Sigh. Yet another list to know about and temporarily follow... The import-sig list doesn't seem to be mirrored on Gmane yet. Also, it claims to be dead w.r.t. Py3.4: """ The intent is that this SIG will be re-retired after Python 3.3 is released. """ -> http://www.python.org/community/sigs/current/import-sig/ """ Resurrected for landing PEP 382 in Python 3.3. """ -> http://mail.python.org/mailman/listinfo/import-sig Seriously, wouldn't python-dev be just fine for this? It's not like the import system is going to be rewritten for each minor release from now on. Stefan
On 6 August 2013 16:03, Stefan Behnel <stefan_ml@behnel.de> wrote:
Seriously, wouldn't python-dev be just fine for this? It's not like the import system is going to be rewritten for each minor release from now on.
We currently use it whenever we're doing a deep dive into import system arcana, so python-dev only needs to worry about the question once it's a clearly viable proposal. I think the other thread will be quite relevant to the topic you're interested in, since we hadn't even considered extension modules yet. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nice idea, but some of those may break 3rd party libraries like Boost. Python that have their own equilavent of the Python/C API. Or Even SWIG might experience trouble in one or two of those. Stefan Behnel <stefan_ml@behnel.de> wrote:
Hi,
let me revive and summarize this old thread.
I suspect that this will be put into a proper PEP at some point, but I'd like to bring this up for discussion first. This came out of issues 13429 and 16392.
http://bugs.python.org/issue13429
http://bugs.python.org/issue16392
The problem ===========
Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. This means that it knows neither the __file__ it is being loaded from nor its package (i.e. its FQMN). This hinders relative imports and resource loading. In Py3, it's also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. And without the FQMN, it's not trivial to correctly add the module to sys.modules either.
We specifically run into this for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of a FQMN and correct file path hinders the compilation of __init__.py modules, i.e.
Stefan Behnel, 08.11.2012 13:47: packages,
especially when relative imports are being used at module init time.
The outcome of this discussion was that the extension module import protocol needs to change in order to provide all necessary information to the module init function.
Brett Cannon proposed to move the module object creation into the extension module importer, i.e. outside of the user provided module init function. CPython would then load the extension module, create and initialise the module object (set __file__, __name__, etc.) and pass it into the module init function.
I proposed to make the PyModuleDef struct the new entry point instead of just a generic C function, as that would give the module importer all necessary information about the module to create the module object. The only missing bit is the entry point for the new module init function.
Nick Coghlan objected to the proposal of simply extending PyModuleDef with an initialiser function, as the struct is part of the stable ABI.
Alternatives I see:
1) Expose a struct that points to the extension module's PyModuleDef struct and the init function and expose that struct instead.
2) Expose both the PyModuleDef and the init function as public symbols.
3) Provide a public C function as entry point that returns both a PyModuleDef pointer and a module init function pointer.
4) Change the m_init function pointer in PyModuleDef_base from func(void) to func(PyObject*) iff the PyModuleDef struct is exposed as a public symbol.
5) Duplicate PyModuleDef and adapt the new one as in 4).
Alternatives 1) and 2) only differ marginally by the number of public symbols being exposed. 3) has the advantage of supporting more advanced setups, e.g. heap allocation for the PyModuleDef struct. 4) is a hack and has the disadvantage that the signature of the module init function cannot be stored across reinitialisations (PyModuleDef has no "flags" or "state" field to remember it). 5) would fix that, i.e. we could add a proper pointer to the new module init function as well as a flags field for future extensions. A similar effect could be achieved by carefully designing the struct in 1).
I think 1-3 are all reasonable ways to do this, although I don't think 3) will be necessary. 5) would be a clean fix, but has the disadvantage of duplicating an entire struct just to change one field in it.
I'm currently leaning towards 1), with a struct that points to PyModuleDef, module init function and a flags field for future extensions. I understand that this would need to become part of the stable ABI, so explicit extensibility is important to keep up backwards compatibility.
Opinions?
Stefan
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com
-- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Ryan, 06.08.2013 17:02:
Nice idea, but some of those may break 3rd party libraries like Boost. Python that have their own equilavent of the Python/C API. Or Even SWIG might experience trouble in one or two of those.
Te idea is that this will be an alternative way of initialising a module that CPython will only use if an extension module exports the corresponding symbol. So it won't break existing code, neither source code nor binaries. Stefan
participants (5)
-
Brett Cannon
-
M.-A. Lemburg
-
Nick Coghlan
-
Ryan
-
Stefan Behnel