Pre-PEP: Redesigning extension modules
Hi, this has been subject to a couple of threads on python-dev already, for example: http://thread.gmane.org/gmane.comp.python.devel/135764/focus=140986 http://thread.gmane.org/gmane.comp.python.devel/141037/focus=141046 It originally came out of issues 13429 and 16392. http://bugs.python.org/issue13429 http://bugs.python.org/issue16392 Here's an initial attempt at a PEP for it. It is based on the (unfinished) ModuleSpec PEP, which is being discussed on the import-sig mailing list. http://mail.python.org/pipermail/import-sig/2013-August/000688.html Stefan PEP: 4XX Title: Redesigning extension modules Version: $Revision$ Last-Modified: $Date$ Author: Stefan Behnel <stefan_ml at behnel.de> BDFL-Delegate: ??? Discussions-To: ??? Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.4 Post-History: 23-Aug-2013 Resolution: Abstract ======== This PEP proposes a redesign of the way in which extension modules interact with the interpreter runtime. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve them by bringing extension modules closer to the way Python modules behave. An implication of this PEP is that extension modules can use arbitrary types for their module implementation and are no longer restricted to types.ModuleType. This makes it easy to support properties at the module level and to safely store arbitrary global state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Motivation ========== Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. This means that it knows neither the __file__ it is being loaded from nor its package (i.e. its fully qualified module name, FQMN). This hinders relative imports and resource loading. In Py3, it's also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. And without the FQMN, it is not trivial to correctly add the module to sys.modules either. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of a FQMN and correct file path hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or reloading and it is neither easy nor efficient with the current infrastructure to support these features. This PEP also addresses these issues. The current process =================== Currently, extension modules export an initialisation function named "PyInit_modulename", named after the file name of the shared library. This function is executed by the import machinery and must return either NULL in the case of an exception, or a fully initialised module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef struct. It then continues to initialise it by adding attributes to the module dict, creating types, etc. In the back, the shared library loader keeps a note of the fully qualified module name of the last module that it loaded, and when a module gets created that has a matching name, this global variable is used to determine the FQMN of the module object. This is not entirely safe as it relies on the module init function creating its own module object first, but this assumption usually holds in practice. The main problem in this process is the missing support for passing state into the module init function, and for safely passing state through to the module creation code. The proposal ============ The current extension module initialisation will be deprecated in favour of a new initialisation scheme. Since the current scheme will continue to be available, existing code will continue to work unchanged, including binary compatibility. Extension modules that support the new initialisation scheme must export a new public symbol "PyModuleCreate_modulename", where "modulename" is the name of the shared library. This mimics the previous naming convention for the "PyInit_modulename" function. This symbol must resolve to a C function with the following signature:: PyObject* (*PyModuleTypeCreateFunction)(PyObject* module_spec) The "module_spec" argument receives a "ModuleSpec" instance, as defined in PEP 4XX (FIXME). (All names are obviously up for debate and bike-shedding at this point.) When called, this function must create and return a type object, either a Python class or an extension type that is allocated on the heap. This type will be instantiated as module instance by the importer. There is no requirement for this type to be exactly or a subtype of types.ModuleType. Any type can be returned. This follows the current support for allowing arbitrary objects in sys.modules and makes it easier for extension modules to define a type that exactly matches their needs for holding module state. The constructor of this type must have the following signature:: def __init__(self, module_spec): The "module_spec" argument receives the same object as the one passed into the module type creation function. Implementation ============== XXX - not started Reloading and Sub-Interpreters ============================== To "reload" an extension module, the module create function is executed again and returns a new module type. This type is then instantiated as by the original module loader and replaces the previous entry in sys.modules. Once the last references to the previous module and its type are gone, both will be subject to normal garbage collection. Sub-interpreter support is an inherent property of the design. During import in the sub-interpreter, the module create function is executed and returns a new module type that is local to the sub-interpreter. Both the type and its module instance are subject to garbage collection in the sub-interpreter. Open questions ============== It is not immediately obvious how extensions should be handled that want to register more than one module in their module init function, e.g. compiled packages. One possibility would be to leave the setup to the user, who would have to know all FQMNs anyway in this case (or could construct them from the module spec of the current module), although not the import file path. A C-API could be provided to register new module types in the current interpreter, given a user provided ModuleSpec. There is no inherent requirement for the module creation function to actually return a type. It could return a arbitrary callable that creates a 'modulish' object when called. Should there be a type check in place that makes sure that what it returns is a type? I don't currently see a need for this. Copyright ========= This document has been placed in the public domain.
Hi, Le Fri, 23 Aug 2013 10:50:18 +0200, Stefan Behnel <stefan_ml@behnel.de> a écrit :
Here's an initial attempt at a PEP for it. It is based on the (unfinished) ModuleSpec PEP, which is being discussed on the import-sig mailing list.
Thanks for trying this. I think the PEP should contain working example code for module initialization (and creation), to help gauge the complexity for module writers. Regards Antoine.
On 23 August 2013 19:18, Antoine Pitrou <solipsis@pitrou.net> wrote:
Hi,
Le Fri, 23 Aug 2013 10:50:18 +0200, Stefan Behnel <stefan_ml@behnel.de> a écrit :
Here's an initial attempt at a PEP for it. It is based on the (unfinished) ModuleSpec PEP, which is being discussed on the import-sig mailing list.
Thanks for trying this. I think the PEP should contain working example code for module initialization (and creation), to help gauge the complexity for module writers.
I've been thinking a lot about this as part of reviewing PEP 451 (the ModuleSpec PEP that Stefan's pre-PEP mentions). The relevant feedback on import-sig hasn't made it into PEP 451 yet (Eric is still considering the suggestion), but what I'm proposing is a new relatively *stateless* API for loaders, which consists of two methods: def create_module(self, spec): """Given a ModuleSpec, return the object to be added to sys.modules""" def exec_module(self, mod): """Execute the given module, updating it for the current system state""" create_module would be optional - if not defined, the import system would automatically create a normal module object. If it is defined, the import system would call it and then take care of setting all the standard attributes (__name__, __spec__, etc) on the result if the loader hadn't already set them. exec_module would be required, and is the part that actually fully initialises the module. "imp.reload" would then translate to calling exec_module on an existing module without recreating it. For loaders that provide the new API, the global import state manipulation would all be handled by the import system. Such loaders would still be free to provide load_module() anyway for backwards compatibility with earlier Python versions, since the new API would take precedence. In this context, the API I was considering for extension modules was slightly different from that in Stefan's proto-PEP (although it was based on some of Stefan's suggestions in the earlier threads). Specifically, I'm thinking of an API like this that does a better job of supporting reloading: PyObject * PyImportCreate_<modulename>(PyObject *spec); /* Optional */ int PyImportExec_<modulename>(PyObject *mod); Implementing PyImportCreate would only be needed if you had C level state to store - if you're happy storing everything in the module globals, then you would only need to implement PyImportExec. My current plan is to create an experimental prototype of this approach this weekend. That will include stdlib test cases, so it will also show how it looks from the extension developer's point of view. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 24 August 2013 15:51, Nick Coghlan <ncoghlan@gmail.com> wrote:
My current plan is to create an experimental prototype of this approach this weekend. That will include stdlib test cases, so it will also show how it looks from the extension developer's point of view.
I prototyped as much as I could without PEP 451's ModuleSpec support here: https://bitbucket.org/ncoghlan/cpython_sandbox/commits/branch/new_extension_... On systems that use dynload_shlib (at least Linux & the BSDs), this branch allows extension modules to be imported if they provide a PyImportExec_NAME hook. The new hook is preferred to the existing PyInit_NAME hook, so extension modules using the stable ABI can provide both and degrade to the legacy initialisation API on older versions of Python. The PyImportExec hook is called with a pre-created module object that the hook is then expected to populate. To aid in this task, I added two new APIs: PyModule_SetDocString PyModule_AddFunctions These cover setting the docstring and adding module level functions, tasks that are handled through the PyModule_Create API when using the PyInit_NAME style hook. The _testimportexec.c module was derived from the existing example xxlimited.c module, with a few name changes. The main functional difference is that _testimportexec uses the new API, so the module object is created externally and passed in to the API, rather than being created by the extension module. The effect of this can be seen in the test suite, where ImportExecTests.test_fresh_module shows that loading the module twice will create two *different* modules, unlike the legacy API. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, 24 Aug 2013 21:36:51 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
On 24 August 2013 15:51, Nick Coghlan <ncoghlan@gmail.com> wrote:
My current plan is to create an experimental prototype of this approach this weekend. That will include stdlib test cases, so it will also show how it looks from the extension developer's point of view.
I prototyped as much as I could without PEP 451's ModuleSpec support here:
https://bitbucket.org/ncoghlan/cpython_sandbox/commits/branch/new_extension_...
On systems that use dynload_shlib (at least Linux & the BSDs), this branch allows extension modules to be imported if they provide a PyImportExec_NAME hook. The new hook is preferred to the existing PyInit_NAME hook, so extension modules using the stable ABI can provide both and degrade to the legacy initialisation API on older versions of Python.
The PyImportExec hook is called with a pre-created module object that the hook is then expected to populate. To aid in this task, I added two new APIs:
PyModule_SetDocString PyModule_AddFunctions
I was thinking about something like PyType_FromSpec, only specialized for module subclasses to make it easier to declare them (e.g. PyModuleType_FromSpec). This would also imply extension module have to be subclasses of the built-in module type. They can't be arbitrary objects like Stefan proposed. I'm not sure what the latter enables, but it would probably make things more difficult internally. Regards Antoine.
Antoine Pitrou, 24.08.2013 13:53:
This would also imply extension module have to be subclasses of the built-in module type. They can't be arbitrary objects like Stefan proposed. I'm not sure what the latter enables, but it would probably make things more difficult internally.
My line of thought was more like: if Python code can stick anything into sys.modules and the runtime doesn't care, why can't extension modules stick anything into sys.modules as well? I can't really see the advantage of requiring a subtype here. Or even just a type, as I said. I guess we'll have to start using this in real code to see if it makes any difference. Stefan
On Sat, 24 Aug 2013 14:51:42 +0200 Stefan Behnel <stefan_ml@behnel.de> wrote:
Antoine Pitrou, 24.08.2013 13:53:
This would also imply extension module have to be subclasses of the built-in module type. They can't be arbitrary objects like Stefan proposed. I'm not sure what the latter enables, but it would probably make things more difficult internally.
My line of thought was more like: if Python code can stick anything into sys.modules and the runtime doesn't care, why can't extension modules stick anything into sys.modules as well?
I can't really see the advantage of requiring a subtype here. Or even just a type, as I said.
sys.modules doesn't care indeed. There's still the whole extension-specific code, though, i.e. the eternal PyModuleDef store and the state management routines. How much of it would remain with your proposal? Regards Antoine.
Antoine Pitrou, 24.08.2013 15:00:
On Sat, 24 Aug 2013 14:51:42 +0200 Stefan Behnel wrote:
Antoine Pitrou, 24.08.2013 13:53:
This would also imply extension module have to be subclasses of the built-in module type. They can't be arbitrary objects like Stefan proposed. I'm not sure what the latter enables, but it would probably make things more difficult internally.
My line of thought was more like: if Python code can stick anything into sys.modules and the runtime doesn't care, why can't extension modules stick anything into sys.modules as well?
I can't really see the advantage of requiring a subtype here. Or even just a type, as I said.
sys.modules doesn't care indeed. There's still the whole extension-specific code, though, i.e. the eternal PyModuleDef store and the state management routines. How much of it would remain with your proposal?
PEP 3121 would no longer be necessary. Extension types can do all we need. No more special casing of modules, that was the idea. Stefan
On 8/24/2013 8:51 AM, Stefan Behnel wrote:
Antoine Pitrou, 24.08.2013 13:53:
This would also imply extension module have to be subclasses of the built-in module type. They can't be arbitrary objects like Stefan proposed. I'm not sure what the latter enables, but it would probably make things more difficult internally.
My line of thought was more like: if Python code can stick anything into sys.modules and the runtime doesn't care, why can't extension modules stick anything into sys.modules as well?
Being able to stick anything in sys.modules in CPython is an implementation artifact rather than language feature. "sys.modules This is a dictionary that maps module names to modules which have already been loaded." This implies to me that an implementation could use a dict subclass (or subtype if you prefer) that checks that keys are names and values ModuleType instances (or None). "This can be manipulated to force reloading of modules and other tricks." I guess this refers to the undocumented (at least here) option of None as a signal value.
I can't really see the advantage of requiring a subtype here. Or even just a type, as I said.
A 'module' has to work with the import machinery and user code. I would ask, "What is the advantage of loosening the current spec?" (Or reinterpreting 'module', if you prefer.) Loosening is hard to undo once done. -- Terry Jan Reedy
2013/8/24 Terry Reedy <tjreedy@udel.edu>:
On 8/24/2013 8:51 AM, Stefan Behnel wrote:
Antoine Pitrou, 24.08.2013 13:53:
This would also imply extension module have to be subclasses of the built-in module type. They can't be arbitrary objects like Stefan proposed. I'm not sure what the latter enables, but it would probably make things more difficult internally.
My line of thought was more like: if Python code can stick anything into sys.modules and the runtime doesn't care, why can't extension modules stick anything into sys.modules as well?
Being able to stick anything in sys.modules in CPython is an implementation artifact rather than language feature.
This is not really true. Many people use this feature to replace modules as they are being imported with other things. -- Regards, Benjamin
On 25 Aug 2013 05:19, "Benjamin Peterson" <benjamin@python.org> wrote:
2013/8/24 Terry Reedy <tjreedy@udel.edu>:
On 8/24/2013 8:51 AM, Stefan Behnel wrote:
Antoine Pitrou, 24.08.2013 13:53:
This would also imply extension module have to be subclasses of the built-in module type. They can't be arbitrary objects like Stefan proposed. I'm not sure what the latter enables, but it would probably make things more difficult internally.
My line of thought was more like: if Python code can stick anything
into
sys.modules and the runtime doesn't care, why can't extension modules stick anything into sys.modules as well?
Being able to stick anything in sys.modules in CPython is an implementation artifact rather than language feature.
This is not really true. Many people use this feature to replace modules as they are being imported with other things.
Right - arbitrary objects in sys.modules is definitely a supported feature (e.g. most lazy import mechanisms rely on that). However, such objects should really provide the module level attributes the import system expects for ducktyping purposes, which is why I suggest the import system should automatically take care of setting those. Cheers, Nick.
-- Regards, Benjamin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
Nick Coghlan, 24.08.2013 13:36:
On 24 August 2013 15:51, Nick Coghlan wrote:
My current plan is to create an experimental prototype of this approach this weekend. That will include stdlib test cases, so it will also show how it looks from the extension developer's point of view.
I prototyped as much as I could without PEP 451's ModuleSpec support here:
https://bitbucket.org/ncoghlan/cpython_sandbox/commits/branch/new_extension_...
Cool. I'll take a look.
On systems that use dynload_shlib (at least Linux & the BSDs), this branch allows extension modules to be imported if they provide a PyImportExec_NAME hook. The new hook is preferred to the existing PyInit_NAME hook, so extension modules using the stable ABI can provide both and degrade to the legacy initialisation API on older versions of Python.
Hmm, right, good call. Since both init schemes have to be part of the stable ABI, we can's rely on people compiling out one or the other. So using the old one as a fallback should work. However, only actual usage in code will tell us how it feels on user side. Supporting both in the same binary will most likely complicate things quite a bit.
The PyImportExec hook is called with a pre-created module object that the hook is then expected to populate. To aid in this task, I added two new APIs:
PyModule_SetDocString PyModule_AddFunctions
These cover setting the docstring and adding module level functions, tasks that are handled through the PyModule_Create API when using the PyInit_NAME style hook.
What are those needed for? If you subtype the module type, or provide an arbitrary extension type as implementation, you'd get these for free, wouldn't you? It's in no way different from setting up an extension type.
The _testimportexec.c module
Where can I find that module?
was derived from the existing example xxlimited.c module, with a few name changes. The main functional difference is that _testimportexec uses the new API, so the module object is created externally and passed in to the API, rather than being created by the extension module. The effect of this can be seen in the test suite, where ImportExecTests.test_fresh_module shows that loading the module twice will create two *different* modules, unlike the legacy API.
Stefan
On 24 August 2013 23:19, Stefan Behnel <stefan_ml@behnel.de> wrote:
Nick Coghlan, 24.08.2013 13:36:
On 24 August 2013 15:51, Nick Coghlan wrote:
My current plan is to create an experimental prototype of this approach this weekend. That will include stdlib test cases, so it will also show how it looks from the extension developer's point of view.
I prototyped as much as I could without PEP 451's ModuleSpec support here:
https://bitbucket.org/ncoghlan/cpython_sandbox/commits/branch/new_extension_...
Cool. I'll take a look.
The new _PyImport_CreateAndExecExtensionModule function does the heavy lifting: https://bitbucket.org/ncoghlan/cpython_sandbox/src/081f8f7e3ee27dc309463b48e... One key point to note is that it *doesn't* call _PyImport_FixupExtensionObject, which is the API that handles all the PEP 3121 per-module state stuff. Instead, the idea will be for modules that don't need additional C level state to just implement PyImportExec_NAME, while those that *do* need C level state implement PyImportCreate_NAME and return a custom object (which may or may not be a module subtype). Such modules can still support reloading (e.g. to pick up reloaded or removed module dependencies) by providing PyImportExec_NAME as well. (in a PEP 451 world, this would likely be split up as two separate functions, one for create, one for exec)
On systems that use dynload_shlib (at least Linux & the BSDs), this branch allows extension modules to be imported if they provide a PyImportExec_NAME hook. The new hook is preferred to the existing PyInit_NAME hook, so extension modules using the stable ABI can provide both and degrade to the legacy initialisation API on older versions of Python.
Hmm, right, good call. Since both init schemes have to be part of the stable ABI, we can's rely on people compiling out one or the other. So using the old one as a fallback should work. However, only actual usage in code will tell us how it feels on user side. Supporting both in the same binary will most likely complicate things quite a bit.
It shouldn't be too bad - the PyInit_NAME fallback would just need to do the equivalent of calling PyImportCreate_NAME (or PyModule_Create if not using a custom object), call PyImportExec_NAME on it, and then return the result. Modules that genuinely *needed* the new behaviour wouldn't be able to provide a sensible fallback, and would thus be limited to Python 3.4+
The PyImportExec hook is called with a pre-created module object that the hook is then expected to populate. To aid in this task, I added two new APIs:
PyModule_SetDocString PyModule_AddFunctions
These cover setting the docstring and adding module level functions, tasks that are handled through the PyModule_Create API when using the PyInit_NAME style hook.
What are those needed for? If you subtype the module type, or provide an arbitrary extension type as implementation, you'd get these for free, wouldn't you? It's in no way different from setting up an extension type.
The idea is to let people use an import system provided module object if they don't define a custom PyImportCreate_NAME hook. Setting the docstring and adding module level functions were the two things that PyModule_Create previously handled neatly through the Py_ModuleDef struct. The two new API functions just break out those subsets as separate operations to call on the import system provided module.
The _testimportexec.c module
Where can I find that module?
Oops, forgot to add it to the repo. Uploaded now: https://bitbucket.org/ncoghlan/cpython_sandbox/src/081f8f7e3ee27dc309463b48e... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan, 24.08.2013 16:22:
On 24 August 2013 23:19, Stefan Behnel wrote:
Nick Coghlan, 24.08.2013 13:36:
On 24 August 2013 15:51, Nick Coghlan wrote:
My current plan is to create an experimental prototype of this approach this weekend. That will include stdlib test cases, so it will also show how it looks from the extension developer's point of view.
I prototyped as much as I could without PEP 451's ModuleSpec support here:
https://bitbucket.org/ncoghlan/cpython_sandbox/commits/branch/new_extension_...
Cool. I'll take a look.
The new _PyImport_CreateAndExecExtensionModule function does the heavy lifting:
https://bitbucket.org/ncoghlan/cpython_sandbox/src/081f8f7e3ee27dc309463b48e...
One key point to note is that it *doesn't* call _PyImport_FixupExtensionObject, which is the API that handles all the PEP 3121 per-module state stuff. Instead, the idea will be for modules that don't need additional C level state to just implement PyImportExec_NAME, while those that *do* need C level state implement PyImportCreate_NAME and return a custom object (which may or may not be a module subtype).
Is it really a common case for an extension module not to need any C level state at all? I mean, this might work for very simple accelerator modules with only a few stand-alone functions. But anything non-trivial will almost certainly have some kind of global state, cache, external library, etc., and that state is best stored at the C level for safety reasons.
Such modules can still support reloading (e.g. to pick up reloaded or removed module dependencies) by providing PyImportExec_NAME as well.
(in a PEP 451 world, this would likely be split up as two separate functions, one for create, one for exec)
Can't we just always require extension modules to implement their own type? Sure, it's a lot of boiler plate code, but that could be handled by a simple C code generator or maybe even a copy&paste example in the docs. I would like to avoid making it too easy for users in the future to get anything wrong with reloading or sub-interpreters. Most people won't test these things for their own code and the harder it is to make them not work, the more likely it is that a given set of dependencies will properly work in a sub-interpreter. If users are required to implement their own type, I think it would be more obvious where to put global module state, how to define functions (i.e. module methods), how to handle garbage collection at the global module level, etc.
On systems that use dynload_shlib (at least Linux & the BSDs), this branch allows extension modules to be imported if they provide a PyImportExec_NAME hook. The new hook is preferred to the existing PyInit_NAME hook, so extension modules using the stable ABI can provide both and degrade to the legacy initialisation API on older versions of Python.
Hmm, right, good call. Since both init schemes have to be part of the stable ABI, we can's rely on people compiling out one or the other. So using the old one as a fallback should work. However, only actual usage in code will tell us how it feels on user side. Supporting both in the same binary will most likely complicate things quite a bit.
It shouldn't be too bad - the PyInit_NAME fallback would just need to do the equivalent of calling PyImportCreate_NAME (or PyModule_Create if not using a custom object), call PyImportExec_NAME on it, and then return the result.
Modules that genuinely *needed* the new behaviour wouldn't be able to provide a sensible fallback, and would thus be limited to Python 3.4+
Right. I only saw it from the POV of Cython, which *will* have to support both, and *will* use the new feature in Py3.4+. No idea how that is going to work, but we've found so many tricks and work-arounds in the past that I'm sure it'll work somehow. Module level properties are just way too tempting not to make use of them. Stefan
On 25 Aug 2013 01:44, "Stefan Behnel" <stefan_ml@behnel.de> wrote:
Nick Coghlan, 24.08.2013 16:22:
On 24 August 2013 23:19, Stefan Behnel wrote:
Nick Coghlan, 24.08.2013 13:36:
On 24 August 2013 15:51, Nick Coghlan wrote:
My current plan is to create an experimental prototype of this approach this weekend. That will include stdlib test cases, so it
also show how it looks from the extension developer's point of view.
I prototyped as much as I could without PEP 451's ModuleSpec support here:
https://bitbucket.org/ncoghlan/cpython_sandbox/commits/branch/new_extension_...
Cool. I'll take a look.
The new _PyImport_CreateAndExecExtensionModule function does the heavy
will lifting:
https://bitbucket.org/ncoghlan/cpython_sandbox/src/081f8f7e3ee27dc309463b48e...
One key point to note is that it *doesn't* call _PyImport_FixupExtensionObject, which is the API that handles all the PEP 3121 per-module state stuff. Instead, the idea will be for modules that don't need additional C level state to just implement PyImportExec_NAME, while those that *do* need C level state implement PyImportCreate_NAME and return a custom object (which may or may not be a module subtype).
Is it really a common case for an extension module not to need any C level state at all? I mean, this might work for very simple accelerator modules with only a few stand-alone functions. But anything non-trivial will almost certainly have some kind of global state, cache, external library, etc., and that state is best stored at the C level for safety reasons.
I'd prefer to encourage people to put that state on an exported *type* rather than directly in the module global state. So while I agree we need to *support* C level module globals, I'd prefer to provide a simpler alternative that avoids them. We also need the create/exec split to properly support reloading. Reload *must* reinitialize the object already in sys.modules instead of inserting a different object or it completely misses the point of reloading modules over deleting and reimporting them (i.e. implicitly affecting the references from other modules that imported the original object).
Such modules can still support reloading (e.g. to pick up reloaded or removed module dependencies) by providing PyImportExec_NAME as well.
(in a PEP 451 world, this would likely be split up as two separate functions, one for create, one for exec)
Can't we just always require extension modules to implement their own type? Sure, it's a lot of boiler plate code, but that could be handled by a simple C code generator or maybe even a copy&paste example in the docs. I would like to avoid making it too easy for users in the future to get anything wrong with reloading or sub-interpreters. Most people won't test these things for their own code and the harder it is to make them not work, the more likely it is that a given set of dependencies will properly work in a sub-interpreter.
If users are required to implement their own type, I think it would be more obvious where to put global module state, how to define functions (i.e. module methods), how to handle garbage collection at the global module level, etc.
Take a look at the current example - everything gets stored in the module dict for the simple case with no C level global state. The module level functions are still added through a Py_MethodDef array, the docstring still comes from a C char pointer. I did have to fix the custom type's tp_new method to use the type pointer passed in by the interpreter rather than a C static global pointer, but that change would also have been needed if defining a custom type. Since Antoine fixed it, there's also nothing particularly quirky about module destruction in 3.4+ - cyclic GC should "just work". Cheers, Nick.
Nick Coghlan, 24.08.2013 23:43:
On 25 Aug 2013 01:44, "Stefan Behnel" wrote:
Nick Coghlan, 24.08.2013 16:22:
The new _PyImport_CreateAndExecExtensionModule function does the heavy lifting:
https://bitbucket.org/ncoghlan/cpython_sandbox/src/081f8f7e3ee27dc309463b48e...
One key point to note is that it *doesn't* call _PyImport_FixupExtensionObject, which is the API that handles all the PEP 3121 per-module state stuff. Instead, the idea will be for modules that don't need additional C level state to just implement PyImportExec_NAME, while those that *do* need C level state implement PyImportCreate_NAME and return a custom object (which may or may not be a module subtype).
Is it really a common case for an extension module not to need any C level state at all? I mean, this might work for very simple accelerator modules with only a few stand-alone functions. But anything non-trivial will almost certainly have some kind of global state, cache, external library, etc., and that state is best stored at the C level for safety reasons.
I'd prefer to encourage people to put that state on an exported *type* rather than directly in the module global state. So while I agree we need to *support* C level module globals, I'd prefer to provide a simpler alternative that avoids them.
But that has an impact on the API then. Why do you want the users of an extension module to go through a separate object (even if it's just a singleton, for example) instead of going through functions at the module level? We don't currently encourage or propose this design for Python modules either. Quite the contrary, it's extremely common for Python modules to provide most of their functionality at the function level. And IMHO that's a good thing. Note that even global functions usually hold state, be it in the form of globally imported modules, global caches, constants, ...
We also need the create/exec split to properly support reloading. Reload *must* reinitialize the object already in sys.modules instead of inserting a different object or it completely misses the point of reloading modules over deleting and reimporting them (i.e. implicitly affecting the references from other modules that imported the original object).
Interesting. I never thought of it that way. I'm not sure this can be done in general. What if the module has threads running that access the global state? In that case, reinitialising the module object itself would almost certainly lead to a crash. And what if you do "from extmodule import some_function" in a Python module? Then reloading couldn't replace that reference, just as for normal Python modules. Meaning that you'd still have to keep both modules properly alive in order to prevent crashes due to lost global state of the imported function. The difference to Python modules here is that in Python code, you'll get some kind of exception if state is lost during a reload. In C code, you'll most likely get a crash. How would you even make sure global state is properly cleaned up? Would you call tp_clear() on the module object before re-running the init code? Or how else would you enable the init code to do the right thing during both the first run (where global state is uninitialised) and subsequent runs (where global state may hold valid state and owned Python references)? Even tp_clear() may not be enough, because it's only meant to clean up Python references, not C-level state. Basically, for reloading to be correct without changing the object reference, it would have to go all the way through tp_dealloc(), catch the object at the very end, right before it gets freed, and then re-initialise it. This sounds like we need some kind of indirection (as you mentioned above), but without the API impact that a separate type implies. Simply making modules an arbitrary extension type, as I proposed, cannot solve this. (Actually, my intuition tells me that if it can't really be made to work 100% for Python modules, e.g. due to the from-import case, why bother with it for extension types?)
Such modules can still support reloading (e.g. to pick up reloaded or removed module dependencies) by providing PyImportExec_NAME as well.
(in a PEP 451 world, this would likely be split up as two separate functions, one for create, one for exec)
Can't we just always require extension modules to implement their own type? Sure, it's a lot of boiler plate code, but that could be handled by a simple C code generator or maybe even a copy&paste example in the docs. I would like to avoid making it too easy for users in the future to get anything wrong with reloading or sub-interpreters. Most people won't test these things for their own code and the harder it is to make them not work, the more likely it is that a given set of dependencies will properly work in a sub-interpreter.
If users are required to implement their own type, I think it would be more obvious where to put global module state, how to define functions (i.e. module methods), how to handle garbage collection at the global module level, etc.
Take a look at the current example - everything gets stored in the module dict for the simple case with no C level global state.
Well, you're storing types there. And those types are your module API. I understand that it's just an example, but I don't think it matches a common case. As far as I can see, the types are not even interacting with each other, let alone doing any C-level access of each other. We should try to focus on the normal case that needs C-level state and C-level field access of extension types. Once that's solved, we can still think about how to make the really simple cases simpler, if it turns out that they are not simple enough. Keeping everything in the module dict is a design that (IMHO) is too error prone. C state should be kept safely at the C level, outside of the reach of Python code. I don't want users of my extension module to be able to provoke a crash by saying "extmodule._xyz = None". I didn't know about PyType_FromSpec(), BTW. It looks like a nice addition for manually written code (although useless for Cython). Stefan
On 8/25/2013 7:54 AM, Stefan Behnel wrote:
And what if you do "from extmodule import some_function" in a Python module? Then reloading couldn't replace that reference, just as for normal Python modules. Meaning that you'd still have to keep both modules properly alive in order to prevent crashes due to lost global state of the imported function.
People who want to reload modules sometimes know before they start that they will want to. If so, they can just 'import' instead of 'from import' and access everything through the module. There is still the problem of persistent class instances directly accessing classes for attributes, but maybe that can be directed through the class also. -- Terry Jan Reedy
Oops, had a draft from a few days ago that I was interrupted before sending. Finished editing the parts I believe are still relevant. On 25 Aug 2013 21:56, "Stefan Behnel" <stefan_ml@behnel.de> wrote:
Nick Coghlan, 24.08.2013 23:43:
On 25 Aug 2013 01:44, "Stefan Behnel" wrote:
Nick Coghlan, 24.08.2013 16:22:
The new _PyImport_CreateAndExecExtensionModule function does the heavy lifting:
https://bitbucket.org/ncoghlan/cpython_sandbox/src/081f8f7e3ee27dc309463b48e...
One key point to note is that it *doesn't* call _PyImport_FixupExtensionObject, which is the API that handles all the PEP 3121 per-module state stuff. Instead, the idea will be for modules that don't need additional C level state to just implement PyImportExec_NAME, while those that *do* need C level state implement PyImportCreate_NAME and return a custom object (which may or may not be a module subtype).
Is it really a common case for an extension module not to need any C level state at all? I mean, this might work for very simple accelerator modules with only a few stand-alone functions. But anything non-trivial will almost certainly have some kind of global state, cache, external library, etc., and that state is best stored at the C level for safety reasons.
In my experience, most extension authors aren't writing high performance C accelerators, they're exposing an existing C API to Python. It's the cffi use case rather than the Cython use case. My primary experience of C extensions is with such wrapper modules, and for those, the exec portion of the new API is exactly what you want. The components of the wrapper module don't share global state, they just translate between Python and a pre-existing externally stateless C API. For that use case, a precreated module to populate with types and functions is exactly what you want to keep things simple and stateless at the C level.
I'd prefer to encourage people to put that state on an exported *type* rather than directly in the module global state. So while I agree we need to *support* C level module globals, I'd prefer to provide a simpler alternative that avoids them.
But that has an impact on the API then. Why do you want the users of an extension module to go through a separate object (even if it's just a singleton, for example) instead of going through functions at the module level? We don't currently encourage or propose this design for Python modules either. Quite the contrary, it's extremely common for Python modules to provide most of their functionality at the function level. And IMHO that's a good thing.
Mutable module global state is always a recipe for obscure bugs, and not something I will ever let through code review without a really good rationale. Hidden process global state is never good, just sometimes a necessary evil. However, keep in mind my patch is currently just the part I can implement without PEP 451 module spec objects. Once those are available, then I can implement the initial hook that supports returning a completely custom object.
Note that even global functions usually hold state, be it in the form of globally imported modules, global caches, constants, ...
If they can be shared safely across multiple instances of the module (e.g. immutable constants), then these can be shared at the C level. Otherwise, a custom Python type will be needed to make them instance specific.
We also need the create/exec split to properly support reloading. Reload *must* reinitialize the object already in sys.modules instead of inserting a different object or it completely misses the point of reloading modules over deleting and reimporting them (i.e. implicitly affecting the references from other modules that imported the original object).
Interesting. I never thought of it that way.
I'm not sure this can be done in general. What if the module has threads running that access the global state? In that case, reinitialising the module object itself would almost certainly lead to a crash.
And what if you do "from extmodule import some_function" in a Python module? Then reloading couldn't replace that reference, just as for normal Python modules. Meaning that you'd still have to keep both modules
My current proposal on import-sig is to make the first hook "prepare_module", and pass in the existing object in the reload case. For the extension loader, this would be reflected in the signature of the C level hook as well, so the module could decide for itself if it supported reloading. properly
alive in order to prevent crashes due to lost global state of the imported function.
The difference to Python modules here is that in Python code, you'll get some kind of exception if state is lost during a reload. In C code, you'll most likely get a crash.
Agreed. This is actually my primary motivation for trying to improve the "can this be reloaded or not?" aspects of the loader API in PEP 451.
How would you even make sure global state is properly cleaned up? Would
you
call tp_clear() on the module object before re-running the init code? Or how else would you enable the init code to do the right thing during both the first run (where global state is uninitialised) and subsequent runs (where global state may hold valid state and owned Python references)?
Up to the module. For Python modules, we just blindly overwrite things and let the GC sort it out. (keep in mind existing extension modules using the existing API will still never be reloaded)
Even tp_clear() may not be enough, because it's only meant to clean up Python references, not C-level state. Basically, for reloading to be correct without changing the object reference, it would have to go all the way through tp_dealloc(), catch the object at the very end, right before
it
gets freed, and then re-initialise it.
This sounds like we need some kind of indirection (as you mentioned above), but without the API impact that a separate type implies. Simply making modules an arbitrary extension type, as I proposed, cannot solve this.
(Actually, my intuition tells me that if it can't really be made to work 100% for Python modules, e.g. due to the from-import case, why bother with it for extension types?)
To fix testing the C implementation of etree using the same model we use for other extension modules (that's loading a second copy rather than reloading in place, but the problems are related).
Such modules can still support reloading (e.g. to pick up reloaded or removed module dependencies) by providing PyImportExec_NAME as well.
(in a PEP 451 world, this would likely be split up as two separate functions, one for create, one for exec)
Can't we just always require extension modules to implement their own type? Sure, it's a lot of boiler plate code, but that could be handled by a simple C code generator or maybe even a copy&paste example in the
docs. I
would like to avoid making it too easy for users in the future to get anything wrong with reloading or sub-interpreters. Most people won't test these things for their own code and the harder it is to make them not work, the more likely it is that a given set of dependencies will properly work in a sub-interpreter.
If users are required to implement their own type, I think it would be more obvious where to put global module state, how to define functions (i.e. module methods), how to handle garbage collection at the global module level, etc.
Take a look at the current example - everything gets stored in the module dict for the simple case with no C level global state.
Well, you're storing types there. And those types are your module API. I understand that it's just an example, but I don't think it matches a common case. As far as I can see, the types are not even interacting with each other, let alone doing any C-level access of each other. We should try to focus on the normal case that needs C-level state and C-level field access of extension types. Once that's solved, we can still think about how to make the really simple cases simpler, if it turns out that they are not simple enough.
Our experience is very different - my perspective is that the normal case either eschews C level global state in the extension module, because it causes so many problems, or else just completely ignores subinterpreter support and proper module cleanup.
Keeping everything in the module dict is a design that (IMHO) is too error prone. C state should be kept safely at the C level, outside of the reach of Python code. I don't want users of my extension module to be able to provoke a crash by saying "extmodule._xyz = None".
So don't have global state in the *extension module*, then, keep it in the regular C/C++ modules. (And don't use the exec-only approach if you do have significant global state in the extension).
I didn't know about PyType_FromSpec(), BTW. It looks like a nice addition for manually written code (although useless for Cython).
This is the only way to create custom types when using the stable ABI. Can I take your observation to mean that Cython doesn't currently offer the option of limiting itself to the stable ABI? Cheers, Nick.
Stefan
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
Nick Coghlan, 31.08.2013 18:49:
On 25 Aug 2013 21:56, "Stefan Behnel" wrote:
One key point to note is that it *doesn't* call _PyImport_FixupExtensionObject, which is the API that handles all the PEP 3121 per-module state stuff. Instead, the idea will be for modules that don't need additional C level state to just implement PyImportExec_NAME, while those that *do* need C level state implement PyImportCreate_NAME and return a custom object (which may or may not be a module subtype).
Is it really a common case for an extension module not to need any C level state at all? I mean, this might work for very simple accelerator modules with only a few stand-alone functions. But anything non-trivial will almost certainly have some kind of global state, cache, external library, etc., and that state is best stored at the C level for safety reasons.
In my experience, most extension authors aren't writing high performance C accelerators, they're exposing an existing C API to Python. It's the cffi use case rather than the Cython use case.
Interesting. I can't really remember a case where I could afford the runtime overhead of implementing a wrapper in Python and going through something like ctypes or cffi. I mean, testing C libraries with Python tools would be one, but then, you wouldn't want to write an extension module for that and instead want to call it directly from the test code as directly as possible. I'm certainly aware that that use case exists, though, and also the case of just wanting to get things done as quickly and easily as possible.
Mutable module global state is always a recipe for obscure bugs, and not something I will ever let through code review without a really good rationale. Hidden process global state is never good, just sometimes a necessary evil.
I'm not necessarily talking about mutable state. Rather about things like pre-initialised data or imported functionality. For example, I often have a bound method of a compiled regex lying around somewhere in my Python modules as a utility function. And the same kind of stuff exists in C code, some may be local to a class, but other things can well be module global. And given that we are talking about module internals here I'd always keep them at the C level rather than exposing them through the module dict. The module dict involves a much higher access overhead, in addition to the reduced safety due to user accessibility. Exported C-APIs are also a use case. You'd import the C-API of another module at init time and from that point on only go through function pointers etc. Those are (sub-)interpreter specific, i.e. they are module global state that is specific to the currently loaded module instances.
However, keep in mind my patch is currently just the part I can implement without PEP 451 module spec objects.
Understood.
Note that even global functions usually hold state, be it in the form of globally imported modules, global caches, constants, ...
If they can be shared safely across multiple instances of the module (e.g. immutable constants), then these can be shared at the C level. Otherwise, a custom Python type will be needed to make them instance specific.
I assume you meant a custom module (extension) type here. Just to be clear, the "module state at the C-level" is meant to be stored in the object struct fields of the extension type that implements the module, at least for modules that want to support reloading and sub-interpreters. Obviously, nothing should be stored in static (global) variables etc.
We also need the create/exec split to properly support reloading. Reload *must* reinitialize the object already in sys.modules instead of inserting a different object or it completely misses the point of reloading modules over deleting and reimporting them (i.e. implicitly affecting the references from other modules that imported the original object).
Interesting. I never thought of it that way.
I'm not sure this can be done in general. What if the module has threads running that access the global state? In that case, reinitialising the module object itself would almost certainly lead to a crash.
My current proposal on import-sig is to make the first hook "prepare_module", and pass in the existing object in the reload case. For the extension loader, this would be reflected in the signature of the C level hook as well, so the module could decide for itself if it supported reloading.
I really don't like the idea of reloading by replacing module state. It would be much simpler if the module itself would be replaced, then the original module could stay alive and could still be used by those who hold a reference to it or parts of its contents. Especially the from-import case would benefit from this. Obviously, you could still run into obscure bugs where a function you call rejects the input because it expects an older version of a type, for example. But I can't see that being worse (or even just different) from the reload-by-refilling-dict case. You seemed to be ok with my idea of making the loader return a wrapped extension module instead of the module itself. We should actually try that.
This is actually my primary motivation for trying to improve the "can this be reloaded or not?" aspects of the loader API in PEP 451.
I assume you mean that the extension module would be able to clearly signal that it can't be reloaded, right? I agree that that's helpful. If you're wrapping a C library, then the way that library is implemented might simply force you to prevent any attempts at reloading the wrapper module. But if reloading is possible at all, it would be even more helpful if we could make it really easy to properly support it.
(keep in mind existing extension modules using the existing API will still never be reloaded)
Sure, that's the cool thing. We can really design this totally from scratch without looking back.
Take a look at the current example - everything gets stored in the module dict for the simple case with no C level global state.
Well, you're storing types there. And those types are your module API. I understand that it's just an example, but I don't think it matches a common case. As far as I can see, the types are not even interacting with each other, let alone doing any C-level access of each other. We should try to focus on the normal case that needs C-level state and C-level field access of extension types. Once that's solved, we can still think about how to make the really simple cases simpler, if it turns out that they are not simple enough.
Our experience is very different - my perspective is that the normal case either eschews C level global state in the extension module, because it causes so many problems, or else just completely ignores subinterpreter support and proper module cleanup.
As soon as you have more than one extension type in your module, and they interact with each other, they will almost certainly have to do type checks against each other to make sure users haven't passed them rubbish before they access any C struct fields of the object. Doing a type check means that at least one type has a pointer to the other, meaning that it holds global module state. I really think that having some kind of global module state is the exceedingly common case for an extension module.
I didn't know about PyType_FromSpec(), BTW. It looks like a nice addition for manually written code (although useless for Cython).
This is the only way to create custom types when using the stable ABI. Can I take your observation to mean that Cython doesn't currently offer the option of limiting itself to the stable ABI?
Correct. I've taken a bird's view at it back then, and keep stumbling over "wow - I couldn't even use that?" kind of declarations in the header files. I don't think it makes sense for Cython. Existing CPython versions are easy to support because they don't change anymore, and new major releases most likely need adaptations anyway, if only to adapt to new features and performance changes. Cython actually knows quite a lot about the inner workings of CPython and its various releases. Going only through the stable ABI parts of the C-API would make the code horribly slow in comparison, so there are huge drawbacks for the benefit it might give. The Cython way of doing it is more like: you want your code to run on a new CPython version, then use a recent Cython release to compile it. It may still work with older ones, but what you actually want is the newest anyway, and you also want to compile the C code for the specific CPython version at hand to get the most out of it. It's the C code that adapts, not the runtime code (or Cython itself). We run continuous integration tests with all of CPython's development branches since 2.4, so we usually support new CPython releases long before they are out. And new releases of CPython rarely affect Cython user code. Stefan
On Sat, 31 Aug 2013 21:16:10 +0200 Stefan Behnel <stefan_ml@behnel.de> wrote:
Our experience is very different - my perspective is that the normal case either eschews C level global state in the extension module, because it causes so many problems, or else just completely ignores subinterpreter support and proper module cleanup.
As soon as you have more than one extension type in your module, and they interact with each other, they will almost certainly have to do type checks against each other to make sure users haven't passed them rubbish before they access any C struct fields of the object. Doing a type check means that at least one type has a pointer to the other, meaning that it holds global module state.
I really think that having some kind of global module state is the exceedingly common case for an extension module.
Since we are eating our own dogfood here (and the work which prompted this discussion was indeed about trying to make our extension modules more cleanup-friendly), it would be nice to take a look at the Modules directory and count which proportion of CPython extension modules have state. Caution: "state" is a bit vague here. Depending on which API you use, custom extension types can be a part of "module state". Regards Antoine.
Antoine Pitrou, 31.08.2013 21:27:
On Sat, 31 Aug 2013 21:16:10 +0200 Stefan Behnel wrote:
Our experience is very different - my perspective is that the normal case either eschews C level global state in the extension module, because it causes so many problems, or else just completely ignores subinterpreter support and proper module cleanup.
As soon as you have more than one extension type in your module, and they interact with each other, they will almost certainly have to do type checks against each other to make sure users haven't passed them rubbish before they access any C struct fields of the object. Doing a type check means that at least one type has a pointer to the other, meaning that it holds global module state.
I really think that having some kind of global module state is the exceedingly common case for an extension module.
Since we are eating our own dogfood here (and the work which prompted this discussion was indeed about trying to make our extension modules more cleanup-friendly), it would be nice to take a look at the Modules directory and count which proportion of CPython extension modules have state.
There seem to be 81 modules in there currently (grepped for PyMODINIT_FUNC). 16 of them come up when you grep for '(TypeCheck|IsInstance)', all using global extension type pointers. 32 use some kind of global "static PyObject* something;". Both add up to 41. That's half of the modules already. I'm sure there's more if you dig deeper. Some modules only define functions or only one type (e.g. md5). They would get away with no global state, I guess - if they all used heap types.
Caution: "state" is a bit vague here. Depending on which API you use, custom extension types can be a part of "module state".
Yep, as I said. Stefan
On 1 Sep 2013 05:18, "Stefan Behnel" <stefan_ml@behnel.de> wrote:
Nick Coghlan, 31.08.2013 18:49:
On 25 Aug 2013 21:56, "Stefan Behnel" wrote:
One key point to note is that it *doesn't* call _PyImport_FixupExtensionObject, which is the API that handles all
PEP 3121 per-module state stuff. Instead, the idea will be for modules that don't need additional C level state to just implement PyImportExec_NAME, while those that *do* need C level state implement PyImportCreate_NAME and return a custom object (which may or may not be a module subtype).
Is it really a common case for an extension module not to need any C level state at all? I mean, this might work for very simple accelerator modules with only a few stand-alone functions. But anything non-trivial will almost certainly have some kind of global state, cache, external library, etc., and that state is best stored at the C level for safety reasons.
In my experience, most extension authors aren't writing high
the performance C
accelerators, they're exposing an existing C API to Python. It's the cffi use case rather than the Cython use case.
Interesting. I can't really remember a case where I could afford the runtime overhead of implementing a wrapper in Python and going through something like ctypes or cffi. I mean, testing C libraries with Python tools would be one, but then, you wouldn't want to write an extension module for that and instead want to call it directly from the test code as directly as possible.
I'm certainly aware that that use case exists, though, and also the case of just wanting to get things done as quickly and easily as possible.
Keep in mind I first came to Python as a tool for test automation of custom C++ hardware APIs that could be written to be SWIG friendly. I now work for an OS vendor where the 3 common languages for system utilities are C, C++ and Python. For those use cases, dropping a bunch of standard Python objects in a module dict is often going to be a quick and easy solution that avoids a lot of nasty pointer lifecycle issues at the C level. This style of extension code would suffer similar runtime checking overhead as Python, including for function calls, but, like CPython itself, would still often be "fast enough". However, as soon as you want to manually optimise for *speed* at all, you're going to want to remove those module internal indirections through the Python API. There are at least three ways to do this (internally, CPython uses all of them in various places): * type checks followed by direct function calls on the optimised path and falling back to the abstract object APIs on the compatibility path * type checks followed by an exception for unknown types * hidden state that isn't exposed directly at the Python level and hence can be trusted to only be changed through the module APIs. The third approach can be implemented in three ways, with various consequences: * C static variables. For mutable state, including pointers to Python types, this breaks subinterpreters, reloading in place and loading a fresh copy of the module * PEP 3121 per-interpreter shared state. Handles subinterpreters, *may* handle reloading (but may segfault if references are held to old types and functions from before the reload), doesn't handle loading a fresh copy at all. * PEP 3121 with a size of "0". As above, but avoids the module state APIs in order to support reloading. All module state (including type cross-references) is stored in hidden state (e.g. an instance of a custom type not exposed to Python, with a reference stored on each custom type object defined in the module, and any module level "functions" actually being methods of a hidden object). Still doesn't support loading a *fresh* copy due to the hidden PEP 3121 module cache. The proposed new approach is to bypass the PEP 3121 cache entirely, and instead recommend providing an instance of a custom type to be placed in sys.modules. Extension modules will be given the ability to explicitly disallow in-place reloading *or* to make it work reliably, rather than the status quo where the import system assumes it will work, and instead may fail in unusual ways.
Mutable module global state is always a recipe for obscure bugs, and not something I will ever let through code review without a really good rationale. Hidden process global state is never good, just sometimes a necessary evil.
I'm not necessarily talking about mutable state. Rather about things like pre-initialised data or imported functionality. For example, I often have a bound method of a compiled regex lying around somewhere in my Python modules as a utility function. And the same kind of stuff exists in C code, some may be local to a class, but other things can well be module global. And given that we are talking about module internals here I'd always keep them at the C level rather than exposing them through the module dict. The module dict involves a much higher access overhead, in addition to the reduced safety due to user accessibility.
Exported C-APIs are also a use case. You'd import the C-API of another module at init time and from that point on only go through function pointers etc. Those are (sub-)interpreter specific, i.e. they are module global state that is specific to the currently loaded module instances.
Due to refcounting, all instances of Python objects qualify as mutable state. Hopefully my elaboration above helps make it clear why I think it's worthwhile to clearly separate out the "no custom C level state needed" case.
However, keep in mind my patch is currently just the part I can implement without PEP 451 module spec objects.
Understood.
Note that even global functions usually hold state, be it in the form of globally imported modules, global caches, constants, ...
If they can be shared safely across multiple instances of the module (e.g. immutable constants), then these can be shared at the C level. Otherwise, a custom Python type will be needed to make them instance specific.
I assume you meant a custom module (extension) type here.
Not sure yet. For PEP 451, we still need to support arbitrary objects in sys.modules, so it's still possible that freedom will be made available to extension modules.
Just to be clear, the "module state at the C-level" is meant to be stored in the object struct fields of the extension type that implements the module, at least for modules that want to support reloading and sub-interpreters. Obviously, nothing should be stored in static (global) variables etc.
We also need the create/exec split to properly support reloading. Reload *must* reinitialize the object already in sys.modules instead of inserting a different object or it completely misses the point of reloading modules over deleting and reimporting them (i.e. implicitly affecting the references from other modules that imported the original object).
Interesting. I never thought of it that way.
I'm not sure this can be done in general. What if the module has
Right. threads
running that access the global state? In that case, reinitialising the module object itself would almost certainly lead to a crash.
That's why I want a way for loaders in general (and extension modules in particular) to clearly say "in-place reloading not supported", rather than Python blundering ahead with it and risking a crash.
My current proposal on import-sig is to make the first hook "prepare_module", and pass in the existing object in the reload case. For the extension loader, this would be reflected in the signature of the C level hook as well, so the module could decide for itself if it supported reloading.
I really don't like the idea of reloading by replacing module state. It would be much simpler if the module itself would be replaced, then the original module could stay alive and could still be used by those who hold a reference to it or parts of its contents. Especially the from-import case would benefit from this. Obviously, you could still run into obscure bugs where a function you call rejects the input because it expects an older version of a type, for example. But I can't see that being worse (or even just different) from the reload-by-refilling-dict case.
You seemed to be ok with my idea of making the loader return a wrapped extension module instead of the module itself. We should actually try
Sure, this is what we do in the test suite in "test.support.import_fresh_module". It was actually Eli trying to use that in the etree tests that triggered our recent investigation of the limits of PEP 3121 (it breaks for stateful extension modules due to the per-interpreter caching). It's a different operation from imp.reload, though. Assuming we can get this stable and reliable in the new API, I expect we'll be able to add "imp.reload_fresh" as a supported API in 3.5. that. Sure, that's just a variant of the "hidden state object" idea I described above. It should actually work today with the PEP 3121 custom storage size set to zero.
This is actually my primary motivation for trying to improve the "can this be reloaded or not?" aspects of the loader API in PEP 451.
I assume you mean that the extension module would be able to clearly signal that it can't be reloaded, right? I agree that that's helpful. If you're wrapping a C library, then the way that library is implemented might simply force you to prevent any attempts at reloading the wrapper module. But if reloading is possible at all, it would be even more helpful if we could make it really easy to properly support it.
Yep, that's my goal (and why it's really good to be having this discussion while PEP 451 is still in development).
(keep in mind existing extension modules using the existing API will
still
never be reloaded)
Sure, that's the cool thing. We can really design this totally from scratch without looking back.
Well, not *quite*. We need to ensure a module can implement both APIs can coexist in the same module for source compatibility without nasty ifdef hacks, and that there is a reasonable migration path for existing handwritten extension modules.
Take a look at the current example - everything gets stored in the module dict for the simple case with no C level global state.
Well, you're storing types there. And those types are your module API. I understand that it's just an example, but I don't think it matches a common case. As far as I can see, the types are not even interacting with each other, let alone doing any C-level access of each other. We should try to focus on the normal case that needs C-level state and C-level field access of extension types. Once that's solved, we can still think about how to make the really simple cases simpler, if it turns out that they are not simple enough.
Our experience is very different - my perspective is that the normal case either eschews C level global state in the extension module, because it causes so many problems, or else just completely ignores subinterpreter support and proper module cleanup.
As soon as you have more than one extension type in your module, and they interact with each other, they will almost certainly have to do type checks against each other to make sure users haven't passed them rubbish before they access any C struct fields of the object. Doing a type check means that at least one type has a pointer to the other, meaning that it holds global module state.
Sure, but you can use the CPython API rather than writing normal C code. We do this fairly often in CPython when we're dealing with things stored in modules that can be manipulated from Python. It incurs CPython's dynamic dispatch overhead, but sometimes that's worth it to avoid needing to deal with C level lifecycle issues.
I really think that having some kind of global module state is the exceedingly common case for an extension module.
I wouldn't be willing to make the call about which of stateless vs stateful is more common without a lot more research :) They're both common enough that I think they should both be well supported, and making the "no custom C level state" case as simple as possible.
I didn't know about PyType_FromSpec(), BTW. It looks like a nice addition for manually written code (although useless for Cython).
This is the only way to create custom types when using the stable ABI. Can I take your observation to mean that Cython doesn't currently offer the option of limiting itself to the stable ABI?
Correct. I've taken a bird's view at it back then, and keep stumbling over "wow - I couldn't even use that?" kind of declarations in the header files. I don't think it makes sense for Cython. Existing CPython versions are easy to support because they don't change anymore, and new major releases most likely need adaptations anyway, if only to adapt to new features and performance changes. Cython actually knows quite a lot about the inner workings of CPython and its various releases. Going only through the stable ABI parts of the C-API would make the code horribly slow in comparison, so there are huge drawbacks for the benefit it might give.
The Cython way of doing it is more like: you want your code to run on a new CPython version, then use a recent Cython release to compile it. It may still work with older ones, but what you actually want is the newest anyway, and you also want to compile the C code for the specific CPython version at hand to get the most out of it. It's the C code that adapts, not the runtime code (or Cython itself).
We run continuous integration tests with all of CPython's development branches since 2.4, so we usually support new CPython releases long before they are out. And new releases of CPython rarely affect Cython user code.
The main advantage of the stable ABI is being able to publish cross-version binary extension modules. I guess if Cython already supports generating binaries for each new version of CPython before we release it, that capability is indeed less useful than it is for those that are maintaining extension modules by hand. Cheers, Nick.
Stefan
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
Nick Coghlan, 01.09.2013 03:28:
On 1 Sep 2013 05:18, "Stefan Behnel" wrote:
I can't really remember a case where I could afford the runtime overhead of implementing a wrapper in Python and going through something like ctypes or cffi. I mean, testing C libraries with Python tools would be one, but then, you wouldn't want to write an extension module for that and instead want to call it directly from the test code as directly as possible.
I'm certainly aware that that use case exists, though, and also the case of just wanting to get things done as quickly and easily as possible.
Keep in mind I first came to Python as a tool for test automation of custom C++ hardware APIs that could be written to be SWIG friendly.
Interesting again. Would you still do it that way? I recently had a discussion with Holger Krekel of py.test fame about testing C code with Cython, and we quickly agreed that wrapping the code in an extension module was both too cumbersome and too inflexible for testing purposes. Specifically, neither of Cython's top selling points fits here, not speed, not clarity, not API design. It's most likely different for SWIG, which involves less (not no, just less) manual work and gives you API-wise more of less exactly what you put in. However, cffi is almost certainly the better way to do it, because it gives you all sorts of flexibility for your test code without having to think about the wrapper design all the time. The situation is also different for C++ where you have less options for wrapping it. I can imagine SWIG still being the tool of choice on that front when it comes to bare and direct testing of large code bases.
I now work for an OS vendor where the 3 common languages for system utilities are C, C++ and Python.
For those use cases, dropping a bunch of standard Python objects in a module dict is often going to be a quick and easy solution that avoids a lot of nasty pointer lifecycle issues at the C level.
That's yet another use case, BTW. When you control the whole application, then safety doesn't really matter at these points and keeping a bunch of stuff in a dict will usually work just fine. I'm mainly used to writing libraries for (sometimes tons of) other people, in which case the requirements are so diverse on user side that safety is a top thing to care about. Anything you can keep inside of C code should stay there. (Especially when dealing with libxml2&friends in lxml which continuously present their 'interesting' usability characteristics.)
* PEP 3121 with a size of "0". As above, but avoids the module state APIs in order to support reloading. All module state (including type cross-references) is stored in hidden state (e.g. an instance of a custom type not exposed to Python, with a reference stored on each custom type object defined in the module, and any module level "functions" actually being methods of a hidden object).
Thanks for elaborating. I had completely failed to make the mental link that you could simply stick bound methods as functions into the module dict, i.e. that they don't even have to be methods of the module itself. That's something that Cython could already use in older CPythons, even as a preparation for any future import protocol changes. The object that they are methods of would then eventually become the module instance. You'd still suffer a slight performance hit from going from a static global C variable to a pointer indirection - for everything: string constants, cached Python objects, all user defined global C variables would have to go there as Cython cannot know if they are module instance specific state or not (they usually will be, I guess). But that has to be done anyway if the goal is to get rid of static state to enable sub-interpreters. I can't wait seeing lxml run threaded in mod_wsgi... ;-)
You seemed to be ok with my idea of making the loader return a wrapped extension module instead of the module itself. We should actually try that.
Sure, that's just a variant of the "hidden state object" idea I described above. It should actually work today with the PEP 3121 custom storage size set to zero.
True. The only difference is whether you leave it to the extension type itself or make it a part of the loader architecture. Anyway, I promise I'll give it a try in Cython. Will be some work, though, to rewrite Cython's use of global variables, create a module state type, migrate everything to heap types, ... I had wanted to do that for a couple of years, but it's clearly not something for a happy afternoon or two. Plus, it would even have to be optional in the compiler to avoid performance regressions for modules that want to continue using fast static globals simply because they cannot support multiple instances anyway (e.g. due to external C library dependencies). Let's see if we can solve that at C compilation time by throwing in a couple of macros. That would at least help keeping the Cython compiler itself simple in that regard... (I guess it would also help with testing as we could just duplicate the test suite runs for both design modes)
As soon as you have more than one extension type in your module, and they interact with each other, they will almost certainly have to do type checks against each other to make sure users haven't passed them rubbish before they access any C struct fields of the object. Doing a type check means that at least one type has a pointer to the other, meaning that it holds global module state.
Sure, but you can use the CPython API rather than writing normal C code. We do this fairly often in CPython when we're dealing with things stored in modules that can be manipulated from Python.
It incurs CPython's dynamic dispatch overhead, but sometimes that's worth it to avoid needing to deal with C level lifecycle issues.
Not so much of a problem in Cython, because all you usually have to do to get fast C level access to something is to change a "def" into a "cdef" somewhere, or add a decorator, or an assignment to a known extension type variable. Once the module global state is 'virtualised', this will also be a safe thing to do in the face of multiple module instances, and still be much faster than going through Python calls.
I really think that having some kind of global module state is the exceedingly common case for an extension module.
I wouldn't be willing to make the call about which of stateless vs stateful is more common without a lot more research :)
They're both common enough that I think they should both be well supported, and making the "no custom C level state" case as simple as possible.
Agreed.
I didn't know about PyType_FromSpec(), BTW. It looks like a nice addition for manually written code (although useless for Cython).
This is the only way to create custom types when using the stable ABI.
I actually think I recall reading about it in the PEP back when it was designed, decided that it made sense in the given context, and then forgot about it as I didn't consider it relevant.
The main advantage of the stable ABI is being able to publish cross-version binary extension modules. I guess if Cython already supports generating binaries for each new version of CPython before we release it, that capability is indeed less useful than it is for those that are maintaining extension modules by hand.
I consider it mostly interesting for Linux distributions and closed source module vendors as it reduces the build/support overhead. But compiling the generated C code for the specific CPython version at hand really has some major advantages. Stefan
On 1 September 2013 18:11, Stefan Behnel <stefan_ml@behnel.de> wrote:
Nick Coghlan, 01.09.2013 03:28:
On 1 Sep 2013 05:18, "Stefan Behnel" wrote:
I can't really remember a case where I could afford the runtime overhead of implementing a wrapper in Python and going through something like ctypes or cffi. I mean, testing C libraries with Python tools would be one, but then, you wouldn't want to write an extension module for that and instead want to call it directly from the test code as directly as possible.
I'm certainly aware that that use case exists, though, and also the case of just wanting to get things done as quickly and easily as possible.
Keep in mind I first came to Python as a tool for test automation of custom C++ hardware APIs that could be written to be SWIG friendly.
Interesting again. Would you still do it that way? I recently had a discussion with Holger Krekel of py.test fame about testing C code with Cython, and we quickly agreed that wrapping the code in an extension module was both too cumbersome and too inflexible for testing purposes. Specifically, neither of Cython's top selling points fits here, not speed, not clarity, not API design. It's most likely different for SWIG, which involves less (not no, just less) manual work and gives you API-wise more of less exactly what you put in. However, cffi is almost certainly the better way to do it, because it gives you all sorts of flexibility for your test code without having to think about the wrapper design all the time.
The situation is also different for C++ where you have less options for wrapping it. I can imagine SWIG still being the tool of choice on that front when it comes to bare and direct testing of large code bases.
To directly wrap C++, I'd still use SWIG. It makes a huge difference when you can tweak the C++ side of the API to be SWIG friendly rather than having to live with whatever a third party C++ library provides. Having classes in C++ map directly to classes in Python is the main benefit of doing it this way over using a C wrapper and cffi. However, for an existing C API, or a custom API where I didn't need the direct object mapping that C++ can provide, using cffi would be a more attractive option than SWIG these days (the stuff I was doing with SWIG was back around 2003 or so). I think this is getting a little off topic for the list, though :)
I now work for an OS vendor where the 3 common languages for system utilities are C, C++ and Python.
For those use cases, dropping a bunch of standard Python objects in a module dict is often going to be a quick and easy solution that avoids a lot of nasty pointer lifecycle issues at the C level.
That's yet another use case, BTW. When you control the whole application, then safety doesn't really matter at these points and keeping a bunch of stuff in a dict will usually work just fine. I'm mainly used to writing libraries for (sometimes tons of) other people, in which case the requirements are so diverse on user side that safety is a top thing to care about. Anything you can keep inside of C code should stay there. (Especially when dealing with libxml2&friends in lxml which continuously present their 'interesting' usability characteristics.)
I don't think it's a coincidence that it was the etree interface with expat that highlighted the deficiencies of the current extension module hooks when it comes to working properly with test.support.import_fresh_module :)
* PEP 3121 with a size of "0". As above, but avoids the module state APIs in order to support reloading. All module state (including type cross-references) is stored in hidden state (e.g. an instance of a custom type not exposed to Python, with a reference stored on each custom type object defined in the module, and any module level "functions" actually being methods of a hidden object).
Thanks for elaborating. I had completely failed to make the mental link that you could simply stick bound methods as functions into the module dict, i.e. that they don't even have to be methods of the module itself. That's something that Cython could already use in older CPythons, even as a preparation for any future import protocol changes. The object that they are methods of would then eventually become the module instance.
You'd still suffer a slight performance hit from going from a static global C variable to a pointer indirection - for everything: string constants, cached Python objects, all user defined global C variables would have to go there as Cython cannot know if they are module instance specific state or not (they usually will be, I guess). But that has to be done anyway if the goal is to get rid of static state to enable sub-interpreters. I can't wait seeing lxml run threaded in mod_wsgi... ;-)
To be honest, I didn't realise that such a trick might already be possible until I was writing down this list of alternatives. If you manage to turn it into a real solution for lxml (or Cython in general), it would be great to hear more about how you turned the general idea into something real :) That means the powers any new extension initialisation API will offer will be limited to: * letting the module know its own name (and other details) * letting the module explicitly block reloading * letting the module support loading multiple copies at once by taking the initial import out of sys.modules (but keeping a separate reference to it alive) <snip>
As soon as you have more than one extension type in your module, and they interact with each other, they will almost certainly have to do type checks against each other to make sure users haven't passed them rubbish before they access any C struct fields of the object. Doing a type check means that at least one type has a pointer to the other, meaning that it holds global module state.
Sure, but you can use the CPython API rather than writing normal C code. We do this fairly often in CPython when we're dealing with things stored in modules that can be manipulated from Python.
It incurs CPython's dynamic dispatch overhead, but sometimes that's worth it to avoid needing to deal with C level lifecycle issues.
Not so much of a problem in Cython, because all you usually have to do to get fast C level access to something is to change a "def" into a "cdef" somewhere, or add a decorator, or an assignment to a known extension type variable. Once the module global state is 'virtualised', this will also be a safe thing to do in the face of multiple module instances, and still be much faster than going through Python calls.
It's kinda cool how designing a next generation API can sometimes reveal hidden possibilities of an *existing* API :) We had another one of those recently on distutils-sig, when I realised the much-maligned .pth modules are actually a decent solution to sharing distributions between virtual environments. I'm so used to disliking their global side effects when used with the system Python that it took me a long time to recognise the validity of using them to make implicit path additions in a more controlled virtual environment :) In terms of where we go from here - do you mind if I use your pre-PEP as the initial basis for a PEP of my own some time in the next week or two (listing you as co-author)? Improving extension module initialisation has been the driver for most of the PEP 451 feedback I've been giving to Eric over on import-sig, so I have some definite ideas on how I think that API should look :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan, 01.09.2013 14:23:
That means the powers any new extension initialisation API will offer will be limited to:
* letting the module know its own name (and other details) * letting the module explicitly block reloading * letting the module support loading multiple copies at once by taking the initial import out of sys.modules (but keeping a separate reference to it alive)
Which, all by themselves, can be considered a huge benefit, IMHO. Plus, if we design the protocol broad enough now, specifically as a two-way interface (info in, module out), we won't have to make any major changes to it again anywhere in the near future, because incremental changes can just be integrated into what's there then, in case we need any. It's sad that we didn't see these requirements for Py3.0.
In terms of where we go from here - do you mind if I use your pre-PEP as the initial basis for a PEP of my own some time in the next week or two (listing you as co-author)? Improving extension module initialisation has been the driver for most of the PEP 451 feedback I've been giving to Eric over on import-sig, so I have some definite ideas on how I think that API should look :)
Go for it. I'm not sure how much time I can actively spend on this during the next weeks anyway, so I'm happy if this continues to get pushed onwards in the meantime. Stefan
On Sun, 1 Sep 2013 11:28:36 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
* PEP 3121 with a size of "0". As above, but avoids the module state APIs in order to support reloading. All module state (including type cross-references) is stored in hidden state (e.g. an instance of a custom type not exposed to Python, with a reference stored on each custom type object defined in the module, and any module level "functions" actually being methods of a hidden object). Still doesn't support loading a *fresh* copy due to the hidden PEP 3121 module cache.
Not sure what you mean by that:
import atexit id(atexit) 140031896222680 import sys del sys.modules['atexit'] import atexit id(atexit) 140031896221400
Due to refcounting, all instances of Python objects qualify as mutable state.
That's an overly broad definition. Many objects are shared between subinterpreters without any problems (None, the empty tuple, built-in types and most C extension types, etc.). As long as the state is an internal implementation detail, there shouldn't be any problem.
I wouldn't be willing to make the call about which of stateless vs stateful is more common without a lot more research :)
They're both common enough that I think they should both be well supported, and making the "no custom C level state" case as simple as possible.
Agreed. Regards Antoine.
On 1 September 2013 23:03, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sun, 1 Sep 2013 11:28:36 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
* PEP 3121 with a size of "0". As above, but avoids the module state APIs in order to support reloading. All module state (including type cross-references) is stored in hidden state (e.g. an instance of a custom type not exposed to Python, with a reference stored on each custom type object defined in the module, and any module level "functions" actually being methods of a hidden object). Still doesn't support loading a *fresh* copy due to the hidden PEP 3121 module cache.
Not sure what you mean by that:
import atexit id(atexit) 140031896222680 import sys del sys.modules['atexit'] import atexit id(atexit) 140031896221400
Ah, you're right - I misremembered the exact problem that broke xml.etree.ElementTree testing. PyModule_GetState is actually fine (since that pointer is hidden state on the module object), it's only PyState_GetModule that is broken when you import a second copy. So, here, when the second import happens, it breaks the original atexit module's callbacks, even though the two callback registries are properly isolated: $ ./python Python 3.4.0a1+ (default:575071257c92+, Aug 25 2013, 00:42:17) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] on linux Type "help", "copyright", "credits" or "license" for more information.
import atexit atexit.register(print, "Hello World!") <built-in function print> import sys del sys.modules["atexit"] import atexit as atexit2 atexit2.register(print, "Goodbye World!") <built-in function print>
Goodbye World!
So I think PEP 3121 is actually as good as we can get on the hidden state front, but the important point is that it is the *PyState_GetModule* API that can't handle fresh imports - the second import will always replace the first one. So anyone affected needs to find some other way of passing the state, like using bound methods of a hidden type rather than ordinary callables. If you have to interoperate with a C API that only accepts a C callback without allowing additional state arguments, you're going to have trouble. I think atexit serves as a good example, though - that _Py_PyAtExit call will *always* be destructive (even if you still have a reference to the original module), so there should be a way for the module to explicitly indicate to the import system "you can only create this module once, and then you're committed - unloading it and importing it again won't work properly due to side effects on the process state". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, 2 Sep 2013 00:10:08 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
$ ./python Python 3.4.0a1+ (default:575071257c92+, Aug 25 2013, 00:42:17) [GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] on linux Type "help", "copyright", "credits" or "license" for more information.
import atexit atexit.register(print, "Hello World!") <built-in function print> import sys del sys.modules["atexit"] import atexit as atexit2 atexit2.register(print, "Goodbye World!") <built-in function print>
Goodbye World!
Yeah, atexit is a very particular example, because it interacts with global state by design (the main interpreter instance), and no amount of module initialization magic can prevent that :-) Speaking of which, it also doesn't work (well) with subinterpreters: http://bugs.python.org/issue18618 Regards Antoine.
Speaking of which, it also doesn't work (well) with subinterpreters:
Could someone briefly explain 'subinterpreter' or point me somewhere in the docs? It appears throughout this thread but there is no index or glossary entry. -- Terry Jan Reedy
On Sun, 01 Sep 2013 16:02:33 -0400 Terry Reedy <tjreedy@udel.edu> wrote:
Speaking of which, it also doesn't work (well) with subinterpreters:
Could someone briefly explain 'subinterpreter' or point me somewhere in the docs? It appears throughout this thread but there is no index or glossary entry.
http://docs.python.org/dev/c-api/init.html#sub-interpreter-support Subinterpreters are a somewhat borderline feature that allows embedding applications to host multiple Python programs in a single process. A well-known example is mod_wsgi. Regards Antoine.
Antoine Pitrou, 01.09.2013 22:06:
On Sun, 01 Sep 2013 16:02:33 -0400 Terry Reedy wrote:
Speaking of which, it also doesn't work (well) with subinterpreters:
Could someone briefly explain 'subinterpreter' or point me somewhere in the docs? It appears throughout this thread but there is no index or glossary entry.
http://docs.python.org/dev/c-api/init.html#sub-interpreter-support
Subinterpreters are a somewhat borderline feature that allows embedding applications to host multiple Python programs in a single process. A well-known example is mod_wsgi.
And extension modules usually don't play well with subinterpreters because each subinterpreter requires its own separate version of the module and extension modules are rarely designed to keep their state completely local to an interpreter, let alone being prepared for having their module init function be called more than once. Stefan
On 9/1/2013 5:13 PM, Stefan Behnel wrote:
Antoine Pitrou, 01.09.2013 22:06:
On Sun, 01 Sep 2013 16:02:33 -0400 Terry Reedy wrote:
Speaking of which, it also doesn't work (well) with subinterpreters:
Could someone briefly explain 'subinterpreter' or point me somewhere in the docs? It appears throughout this thread but there is no index or glossary entry.
http://docs.python.org/dev/c-api/init.html#sub-interpreter-support
So cpython specific.
Subinterpreters are a somewhat borderline feature that allows embedding applications to host multiple Python programs in a single process. A well-known example is mod_wsgi.
Thank you for both the link *and* the explanatory example, which just what I needed to make the past discussion more intelligible. I imagine that wsgi uses a sub-interpreter for each user connection.
And extension modules usually don't play well with subinterpreters because each subinterpreter requires its own separate version of the module and extension modules are rarely designed to keep their state completely local to an interpreter, let alone being prepared for having their module init function be called more than once.
I can see now why this is a bit of a 'hair-puller';-). -- Terry Jan Reedy
On Sat, Aug 31, 2013 at 1:16 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Nick Coghlan, 31.08.2013 18:49:
This is actually my primary motivation for trying to improve the "can this be reloaded or not?" aspects of the loader API in PEP 451.
I assume you mean that the extension module would be able to clearly signal that it can't be reloaded, right? I agree that that's helpful. If you're wrapping a C library, then the way that library is implemented might simply force you to prevent any attempts at reloading the wrapper module. But if reloading is possible at all, it would be even more helpful if we could make it really easy to properly support it.
When loader.exec_module() gets called, it should raise ImportError if the module does not support reloading. -eric
On Fri, Aug 23, 2013 at 4:50 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Reloading and Sub-Interpreters ==============================
To "reload" an extension module, the module create function is executed again and returns a new module type. This type is then instantiated as by the original module loader and replaces the previous entry in sys.modules. Once the last references to the previous module and its type are gone, both will be subject to normal garbage collection.
I haven't had a chance to address this on the import-sig discussion yet about ModuleSpec, but I would like to just mention that one property of the existing module system that I'm not sure either this proposal or the ModuleSpec proposal preserves is that it's possible to implement lazy importing of modules using standard reload() semantics. My "Importing" package offers lazy imports by creating module objects in sys.modules that are a subtype of ModuleType, and use a __getattribute__ hook so that trying to use them fires off a reload() of the module. Because the dummy module doesn't have __file__ or anything else initialized, the import system searches for the module and then loads it, reusing the existing module object, even though it's actually only executing the module code for the first time. That the existing object be reused is important, because once the dummy is in sys.modules, it can also be imported by other modules, so references to it can abound everywhere, and we wish only for it to be loaded lazily, without needing to trace down and replace all instances of it. This also preserves other invariants of the module system. Anyway, the reason I was asking why reloading is being handled as a special case in the ModuleSpec proposal -- and the reason I'm curious about certain provisions of this proposal -- is that making the assumption you can only reload something with the same spec/location/etc. it was originally loaded with, and/or that if you are reloading a module then you previously had a chance to do things to it, doesn't jibe with the way things work currently. That is to say, in the pure PEP 302 world, there is no special status for "reload" that is different from "load" -- the *only* thing that's different is that there is already a module object to use, and there is *no guarantee that it's a module object that was initialized by the loader now being invoked*. AFAICT both this proposal and the ModuleSpec one are making an invalid assumption per PEP 302, and aren't explicitly proposing to change the status quo: they just assume things that aren't actually assured by the prior specs or implementations. So, for example, this extension module proposal needs to cover what happens if an extension module is reloaded and the module object is not of the type or instance it's expecting. Must it do its own checking? Error handling? Will some other portion of the import system be expected to handle it? For that matter, what happens (in either proposal) if you reload() a module which only has a __name__, and no other attributes? I haven't tested with importlib, but with earlier Pythons this results in a standard module search being done by reload(). But the ModuleSpec proposal and this one seem to assume that a reload()-ed module must already be associated with a loader, location, and/or spec.
On 25 August 2013 14:12, PJ Eby <pje@telecommunity.com> wrote:
That is to say, in the pure PEP 302 world, there is no special status for "reload" that is different from "load" -- the *only* thing that's different is that there is already a module object to use, and there is *no guarantee that it's a module object that was initialized by the loader now being invoked*.
Yeah, this is an aspect of why I'd like PEP 451 to use create & exec for the new loader API components. That way, any loader which either doesn't define the create method, or which returns NotImplemented from the call (a subtlety needed to make this work for C extensions), can be used with reload *and* with the -m switch via runpy (currently runpy demands the ability to get hold of the code object).
AFAICT both this proposal and the ModuleSpec one are making an invalid assumption per PEP 302, and aren't explicitly proposing to change the status quo: they just assume things that aren't actually assured by the prior specs or implementations.
Indeed. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Hi, thanks for bringing this up. It clearly shows that there is more to this problem than I initially thought. Let me just add one idea that your post gave me. PJ Eby, 25.08.2013 06:12:
My "Importing" package offers lazy imports by creating module objects in sys.modules that are a subtype of ModuleType, and use a __getattribute__ hook so that trying to use them fires off a reload() of the module.
I wonder if this wouldn't be an approach to fix the reloading problem in general. What if extension module loading, at least with the new scheme, didn't return the module object itself and put it into sys.modules but created a wrapper that redirects its __getattr__ and __setattr__ to the actual module object? That would have a tiny performance impact on attribute access, but I'd expect that to be negligible given that the usual reason for the extension module to exist is that it does non-trivial stuff in whatever its API provides. Reloading could then really create a completely new module object and replace the reference inside of the wrapper. That way, code that currently uses "from extmodule import xyz" would continue to see the original version of the module as of the time of its import, and code that just did "import extmodule" and then used attribute access at need would always see the current content of the module as it was last loaded. I think that, together with keeping module global state in the module object itself, would nicely fix both cases. Stefan
*bump* Does this sound like a viable solution? Stefan Stefan Behnel, 25.08.2013 14:36:
Hi,
thanks for bringing this up. It clearly shows that there is more to this problem than I initially thought.
Let me just add one idea that your post gave me.
PJ Eby, 25.08.2013 06:12:
My "Importing" package offers lazy imports by creating module objects in sys.modules that are a subtype of ModuleType, and use a __getattribute__ hook so that trying to use them fires off a reload() of the module.
I wonder if this wouldn't be an approach to fix the reloading problem in general. What if extension module loading, at least with the new scheme, didn't return the module object itself and put it into sys.modules but created a wrapper that redirects its __getattr__ and __setattr__ to the actual module object? That would have a tiny performance impact on attribute access, but I'd expect that to be negligible given that the usual reason for the extension module to exist is that it does non-trivial stuff in whatever its API provides. Reloading could then really create a completely new module object and replace the reference inside of the wrapper.
That way, code that currently uses "from extmodule import xyz" would continue to see the original version of the module as of the time of its import, and code that just did "import extmodule" and then used attribute access at need would always see the current content of the module as it was last loaded. I think that, together with keeping module global state in the module object itself, would nicely fix both cases.
Stefan
On 1 Sep 2013 00:10, "Stefan Behnel" <stefan_ml@behnel.de> wrote:
*bump*
Does this sound like a viable solution?
This isn't likely to progress until we have Eric's PEP 451 to a point where it's ready for python-dev discussion and pronouncement. However, the revised loader API is being designed to allow for the loader returning arbitrary objects, so something along these lines should work. There will likely be some adjustments to the API signature to allow extension modules to optionally support reloading if they so desire. Cheers, Nick.
Stefan
Stefan Behnel, 25.08.2013 14:36:
Hi,
thanks for bringing this up. It clearly shows that there is more to this problem than I initially thought.
Let me just add one idea that your post gave me.
PJ Eby, 25.08.2013 06:12:
My "Importing" package offers lazy imports by creating module objects in sys.modules that are a subtype of ModuleType, and use a __getattribute__ hook so that trying to use them fires off a reload() of the module.
I wonder if this wouldn't be an approach to fix the reloading problem in general. What if extension module loading, at least with the new scheme, didn't return the module object itself and put it into sys.modules but created a wrapper that redirects its __getattr__ and __setattr__ to the actual module object? That would have a tiny performance impact on attribute access, but I'd expect that to be negligible given that the
reason for the extension module to exist is that it does non-trivial stuff in whatever its API provides. Reloading could then really create a completely new module object and replace the reference inside of the wrapper.
That way, code that currently uses "from extmodule import xyz" would continue to see the original version of the module as of the time of its import, and code that just did "import extmodule" and then used attribute access at need would always see the current content of the module as it was last loaded. I think that, together with keeping module global state in
usual the
module object itself, would nicely fix both cases.
Stefan
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
Le Sun, 1 Sep 2013 02:19:48 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
On 1 Sep 2013 00:10, "Stefan Behnel" <stefan_ml@behnel.de> wrote:
*bump*
Does this sound like a viable solution?
This isn't likely to progress until we have Eric's PEP 451 to a point where it's ready for python-dev discussion and pronouncement.
However, the revised loader API is being designed to allow for the loader returning arbitrary objects, so something along these lines should work. There will likely be some adjustments to the API signature to allow extension modules to optionally support reloading if they so desire.
I think the biggest challenge here is to propose an API that's simple and easy to use (i.e. that doesn't make extension module writing more complicated than it currently is). The basic concept of putting custom module objects in sys.modules is sound, IMHO. As for "extension module as a wrapper", though, it shounds like the kind of complication I would personally prefer to stay away from. Also, it would make extension modules less like Python modules, rather than more. Regards Antoine.
Cheers, Nick.
Stefan
Stefan Behnel, 25.08.2013 14:36:
Hi,
thanks for bringing this up. It clearly shows that there is more to this problem than I initially thought.
Let me just add one idea that your post gave me.
PJ Eby, 25.08.2013 06:12:
My "Importing" package offers lazy imports by creating module objects in sys.modules that are a subtype of ModuleType, and use a __getattribute__ hook so that trying to use them fires off a reload() of the module.
I wonder if this wouldn't be an approach to fix the reloading problem in general. What if extension module loading, at least with the new scheme, didn't return the module object itself and put it into sys.modules but created a wrapper that redirects its __getattr__ and __setattr__ to the actual module object? That would have a tiny performance impact on attribute access, but I'd expect that to be negligible given that the
reason for the extension module to exist is that it does non-trivial stuff in whatever its API provides. Reloading could then really create a completely new module object and replace the reference inside of the wrapper.
That way, code that currently uses "from extmodule import xyz" would continue to see the original version of the module as of the time of its import, and code that just did "import extmodule" and then used attribute access at need would always see the current content of the module as it was last loaded. I think that, together with keeping module global state in
usual the
module object itself, would nicely fix both cases.
Stefan
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
On 2 September 2013 18:16, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le Sun, 1 Sep 2013 02:19:48 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
On 1 Sep 2013 00:10, "Stefan Behnel" <stefan_ml@behnel.de> wrote:
*bump*
Does this sound like a viable solution?
This isn't likely to progress until we have Eric's PEP 451 to a point where it's ready for python-dev discussion and pronouncement.
However, the revised loader API is being designed to allow for the loader returning arbitrary objects, so something along these lines should work. There will likely be some adjustments to the API signature to allow extension modules to optionally support reloading if they so desire.
I think the biggest challenge here is to propose an API that's simple and easy to use (i.e. that doesn't make extension module writing more complicated than it currently is).
The basic concept of putting custom module objects in sys.modules is sound, IMHO.
The hook API I currently have in mind is a two step initialisation: PyImport_PrepareNAME (optional) PyImport_ExecNAME If you don't define prepare, the import system takes care of creating a module object for you, and passing it in to the exec hook. The return result from that is just an integer indicating success or failure (on failure, an exception should be set). If you *do* define the prepare hook, then it's similar to the existing init hook, but receives a PEP 451 module spec object with info about the module being imported (see PEP 451 for the draft details) and is permitted to return an arbitrary PyObject reference. The main open questions I have are how to deal with clearly indicating whether modules support in-place reloading, unloading, loading in subinterpreters and/or loading a second copy in the same interpreter. That's actually more a question for PEP 451 though, so I'll post some more detailed thoughts on that over on import-sig, which may eventually make their way into a PEP 451 draft.
As for "extension module as a wrapper", though, it shounds like the kind of complication I would personally prefer to stay away from. Also, it would make extension modules less like Python modules, rather than more.
Yeah, that will be allowed (since we'll probably permit returning arbitrary objects), but definitely not required. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Sep 2, 2013 at 7:02 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The hook API I currently have in mind is a two step initialisation:
PyImport_PrepareNAME (optional) PyImport_ExecNAME
Should we also look at an API change for the initfunc() of PyImport_Inittab entries? Currently the function takes a module name, which doesn't jive with loader.exec_module() taking a module. I noticed this while adding an exec_module() to BuiltinImporter. I suppose the same thing goes for PyImport_ImportFrozenModuleObject(). -eric
Eric Snow, 08.09.2013 00:22:
On Mon, Sep 2, 2013 at 7:02 AM, Nick Coghlan wrote:
The hook API I currently have in mind is a two step initialisation:
PyImport_PrepareNAME (optional) PyImport_ExecNAME
Should we also look at an API change for the initfunc() of PyImport_Inittab entries? Currently the function takes a module name, which doesn't jive with loader.exec_module() taking a module. I noticed this while adding an exec_module() to BuiltinImporter. I suppose the same thing goes for PyImport_ImportFrozenModuleObject().
Is it still the case that the inittab mechanism only works for the embedding case? It would be nice to have a declarative mechanism for registering a set of modules from a running module init function. Stefan
On Mon, Sep 2, 2013 at 2:16 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I think the biggest challenge here is to propose an API that's simple and easy to use (i.e. that doesn't make extension module writing more complicated than it currently is).
+1
The basic concept of putting custom module objects in sys.modules is sound, IMHO.
As for "extension module as a wrapper", though, it shounds like the kind of complication I would personally prefer to stay away from. Also, it would make extension modules less like Python modules, rather than more.
It all depends on how useful it would be to be able to safely reload extension modules from their files. -eric
On Sat, Aug 24, 2013 at 7:07 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
PEP 3121 would no longer be necessary. Extension types can do all we need. No more special casing of modules, that was the idea.
One nice thing about PEP 3121 is the addition of md_state to module objects to store internal module state. Wouldn't we be better served by improving the related API rather than abandoning it? If md_state were the home for all mutable internal state then load/reload could focus directly on just md_state and md_dict and not worry about other internal state, since all remaining state would be immutable (refcounts notwithstanding). If the API made this easier then we could leverage the strengths of PEP 3121 to make loading safer and more independent. Of course, we could certainly go the other way and actively discourage mutable internal state... This, coupled with the PEP 451-compatible API and with a proxying wrapper, would go a long way to various "reloading" issues that extension modules have. On Sun, Aug 25, 2013 at 5:54 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
(regarding reloading into the existing module's namespace)
I'm not sure this can be done in general. What if the module has threads running that access the global state? In that case, reinitialising the module object itself would almost certainly lead to a crash.
And what if you do "from extmodule import some_function" in a Python module? Then reloading couldn't replace that reference, just as for normal Python modules. Meaning that you'd still have to keep both modules properly alive in order to prevent crashes due to lost global state of the imported function.
The difference to Python modules here is that in Python code, you'll get some kind of exception if state is lost during a reload. In C code, you'll most likely get a crash.
How would you even make sure global state is properly cleaned up? Would you call tp_clear() on the module object before re-running the init code? Or how else would you enable the init code to do the right thing during both the first run (where global state is uninitialised) and subsequent runs (where global state may hold valid state and owned Python references)?
Even tp_clear() may not be enough, because it's only meant to clean up Python references, not C-level state. Basically, for reloading to be correct without changing the object reference, it would have to go all the way through tp_dealloc(), catch the object at the very end, right before it gets freed, and then re-initialise it.
Right. It would probably require a separate `PyImportInitializeState_<module>(PyObject *mod)` and/or some API that helps make it easier to manage mutable internal module state (on md_state). On Sun, Aug 25, 2013 at 6:36 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
PJ Eby, 25.08.2013 06:12:
My "Importing" package offers lazy imports by creating module objects in sys.modules that are a subtype of ModuleType, and use a __getattribute__ hook so that trying to use them fires off a reload() of the module.
I wonder if this wouldn't be an approach to fix the reloading problem in general. What if extension module loading, at least with the new scheme, didn't return the module object itself and put it into sys.modules but created a wrapper that redirects its __getattr__ and __setattr__ to the actual module object? That would have a tiny performance impact on attribute access, but I'd expect that to be negligible given that the usual reason for the extension module to exist is that it does non-trivial stuff in whatever its API provides. Reloading could then really create a completely new module object and replace the reference inside of the wrapper.
That way, code that currently uses "from extmodule import xyz" would continue to see the original version of the module as of the time of its import, and code that just did "import extmodule" and then used attribute access at need would always see the current content of the module as it was last loaded. I think that, together with keeping module global state in the module object itself, would nicely fix both cases.
At first blush I like this. -eric p.s. Bear with me if I've missed something in the thread. I'm slogging through a backlog of email
On Thu, 5 Sep 2013 23:26:31 -0600 Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Sat, Aug 24, 2013 at 7:07 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
PEP 3121 would no longer be necessary. Extension types can do all we need. No more special casing of modules, that was the idea.
One nice thing about PEP 3121 is the addition of md_state to module objects to store internal module state. Wouldn't we be better served by improving the related API rather than abandoning it?
md_state isn't a PyObject and therefore its lifetime management is quirky (as Py_buffer, same bad idea). So I'd be happy for it to disappear from the next API.
This, coupled with the PEP 451-compatible API and with a proxying wrapper, would go a long way to various "reloading" issues that extension modules have.
Proxying wrapper? We shouldn't need that kind of tricks. Regards Antoine.
On Sat, Aug 24, 2013 at 10:12 PM, PJ Eby <pje@telecommunity.com> wrote:
I haven't had a chance to address this on the import-sig discussion yet about ModuleSpec, but I would like to just mention that one property of the existing module system that I'm not sure either this proposal or the ModuleSpec proposal preserves is that it's possible to implement lazy importing of modules using standard reload() semantics.
My "Importing" package offers lazy imports by creating module objects in sys.modules that are a subtype of ModuleType, and use a __getattribute__ hook so that trying to use them fires off a reload() of the module. Because the dummy module doesn't have __file__ or anything else initialized, the import system searches for the module and then loads it, reusing the existing module object, even though it's actually only executing the module code for the first time.
That the existing object be reused is important, because once the dummy is in sys.modules, it can also be imported by other modules, so references to it can abound everywhere, and we wish only for it to be loaded lazily, without needing to trace down and replace all instances of it. This also preserves other invariants of the module system.
Anyway, the reason I was asking why reloading is being handled as a special case in the ModuleSpec proposal -- and the reason I'm curious about certain provisions of this proposal -- is that making the assumption you can only reload something with the same spec/location/etc. it was originally loaded with, and/or that if you are reloading a module then you previously had a chance to do things to it, doesn't jibe with the way things work currently.
That is to say, in the pure PEP 302 world, there is no special status for "reload" that is different from "load" -- the *only* thing that's different is that there is already a module object to use, and there is *no guarantee that it's a module object that was initialized by the loader now being invoked*.
In Python 3.3 (#13959) import.reload() was updated to reuse a module's __loader__, which must now be set. If __loader__ is not set, you get an AttributeError. If that's a problem we can create a tracker issue and discuss there. In Python 3.4 imp.reload() is just an alias to importlib.reload(), but it works basically the same. With ModuleSpec things won't work that differently. If you reload such a module as you described, it will look for __spec__ and call it's reload() method. If __spec__ is not set, you get an AttributeError. It wouldn't be that hard to build a spec from the module if need be and then use that. -eric
AFAICT both this proposal and the ModuleSpec one are making an invalid assumption per PEP 302, and aren't explicitly proposing to change the status quo: they just assume things that aren't actually assured by the prior specs or implementations.
So, for example, this extension module proposal needs to cover what happens if an extension module is reloaded and the module object is not of the type or instance it's expecting. Must it do its own checking? Error handling? Will some other portion of the import system be expected to handle it?
For that matter, what happens (in either proposal) if you reload() a module which only has a __name__, and no other attributes? I haven't tested with importlib, but with earlier Pythons this results in a standard module search being done by reload(). But the ModuleSpec proposal and this one seem to assume that a reload()-ed module must already be associated with a loader, location, and/or spec. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail....
participants (7)
-
Antoine Pitrou
-
Benjamin Peterson
-
Eric Snow
-
Nick Coghlan
-
PJ Eby
-
Stefan Behnel
-
Terry Reedy