Python subinterpreters support problem in v0.29
Hi Cython developers, In the recent Cython 0.29 version was introduced a commit [1] that hinders the usage of python subinterpreters. I discovered this the hard way when suddenly a component I was working on started to crash. The component in question is the ceph-mgr daemon from the Ceph project [2]. Python subinterpreters are the basic building block for the plugin/module architecture of ceph-mgr. Each "manager module" runs in its own python subinterpreter. Furthermore, all python bindings for the client libraries of Ceph, such as librados, librbd, libcephfs, and librgw, are implemented as Cython modules, and in the particular case of librados, all ceph-mgr plugin modules import the rados Cython module upon initialization. In practice, with Cython 0.29 we can only load one module, because the following modules will refuse to load. After discovering this issue, we "temporarily" prevent the issue by restricting the version of Cython as a dependency [3]. But we don't want to keep this restriction indefinitely and would prefer a fix from the Cython side. Do you think it's feasible to implement a flag to disable the safe guard introduced in [1]? That way we could re-enable subinterpreters at our own risk. [1] https://github.com/cython/cython/commit/7e27c7cd51a2f048cd6d3c246740cd977f8d... [2] https://github.com/ceph/ceph [3] https://github.com/ceph/ceph/pull/25328 -- Ricardo Dias Senior Software Engineer - Storage Team SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
Ricardo Dias schrieb am 10.12.18 um 14:42:
In the recent Cython 0.29 version was introduced a commit [1] that hinders the usage of python subinterpreters.
I discovered this the hard way when suddenly a component I was working on started to crash. The component in question is the ceph-mgr daemon from the Ceph project [2].
Python subinterpreters are the basic building block for the plugin/module architecture of ceph-mgr. Each "manager module" runs in its own python subinterpreter. Furthermore, all python bindings for the client libraries of Ceph, such as librados, librbd, libcephfs, and librgw, are implemented as Cython modules, and in the particular case of librados, all ceph-mgr plugin modules import the rados Cython module upon initialization.
In practice, with Cython 0.29 we can only load one module, because the following modules will refuse to load.
After discovering this issue, we "temporarily" prevent the issue by restricting the version of Cython as a dependency [3]. But we don't want to keep this restriction indefinitely and would prefer a fix from the Cython side.
Do you think it's feasible to implement a flag to disable the safe guard introduced in [1]? That way we could re-enable subinterpreters at our own risk.
[1] https://github.com/cython/cython/commit/7e27c7cd51a2f048cd6d3c246740cd977f8d... [2] https://github.com/ceph/ceph [3] https://github.com/ceph/ceph/pull/25328
My guess is that your modules just silently leaked object references and memory with the previous Cython versions. That is why we now inserted a guard that detects cases where the module init function is executed multiple times, which would overwrite the state of the previous run. The shared library of an extension module is only loaded once, so any global C state is shared for the entire process, regardless of how often CPython calls the module init function. I am surprised that your setup didn't crash in any way. Could you explain a bit more how you are using this feature? Are the different subinterpreters running in parallel or sequentially? The ceph repo looks huge. Any pointers where I should start looking? I actually wonder if we could at least support sequential usages through the module cleanup mechanism. Once a module is cleaned up and all global objects freed, calling the module init function again should be ok. Apart from that, here is the feature ticket for module specific global state: https://github.com/cython/cython/issues/2343 Stefan
On 11/12/18 19:39, Stefan Behnel wrote:
Ricardo Dias schrieb am 10.12.18 um 14:42:
In the recent Cython 0.29 version was introduced a commit [1] that hinders the usage of python subinterpreters.
I discovered this the hard way when suddenly a component I was working on started to crash. The component in question is the ceph-mgr daemon from the Ceph project [2].
Python subinterpreters are the basic building block for the plugin/module architecture of ceph-mgr. Each "manager module" runs in its own python subinterpreter. Furthermore, all python bindings for the client libraries of Ceph, such as librados, librbd, libcephfs, and librgw, are implemented as Cython modules, and in the particular case of librados, all ceph-mgr plugin modules import the rados Cython module upon initialization.
In practice, with Cython 0.29 we can only load one module, because the following modules will refuse to load.
After discovering this issue, we "temporarily" prevent the issue by restricting the version of Cython as a dependency [3]. But we don't want to keep this restriction indefinitely and would prefer a fix from the Cython side.
Do you think it's feasible to implement a flag to disable the safe guard introduced in [1]? That way we could re-enable subinterpreters at our own risk.
[1] https://github.com/cython/cython/commit/7e27c7cd51a2f048cd6d3c246740cd977f8d... [2] https://github.com/ceph/ceph [3] https://github.com/ceph/ceph/pull/25328
My guess is that your modules just silently leaked object references and memory with the previous Cython versions. That is why we now inserted a guard that detects cases where the module init function is executed multiple times, which would overwrite the state of the previous run. The shared library of an extension module is only loaded once, so any global C state is shared for the entire process, regardless of how often CPython calls the module init function.
I assume that the problem with subinterpreters occurs when a cython module declares some static/global variables, which might cause undesirable side-effects upon module loading in several subinterpreters. I believe the cython modules that we develop in Ceph do not declared any global state, and therefore the modules have been working good when loaded by several subinterpreters.
I am surprised that your setup didn't crash in any way. Could you explain a bit more how you are using this feature? Are the different subinterpreters running in parallel or sequentially? The ceph repo looks huge. Any pointers where I should start looking?
The subinterpreters are run in parallel. Basically we have a single process, the ceph-mgr daemon that creates a subinterpreter per each mgr plugin (a plugin is basically a pure python module) that it finds in a specific location. All these plugins import the "rados" cython module to be able to talk with the Ceph cluster. The C++ code that manages the subinterpreters can be found at: https://github.com/ceph/ceph/tree/master/src/mgr More specifically in the files PyModule.* PyModuleRegistry.*: https://github.com/ceph/ceph/blob/master/src/mgr/PyModule.cc#L324
I actually wonder if we could at least support sequential usages through the module cleanup mechanism. Once a module is cleaned up and all global objects freed, calling the module init function again should be ok.> Apart from that, here is the feature ticket for module specific global state:
https://github.com/cython/cython/issues/2343
Stefan _______________________________________________ cython-devel mailing list cython-devel@python.org https://mail.python.org/mailman/listinfo/cython-devel
-- Ricardo Dias Senior Software Engineer - Storage Team SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
Ricardo Dias schrieb am 11.12.18 um 23:16:
On 11/12/18 19:39, Stefan Behnel wrote:
Ricardo Dias schrieb am 10.12.18 um 14:42:
In the recent Cython 0.29 version was introduced a commit [1] that hinders the usage of python subinterpreters.
I discovered this the hard way when suddenly a component I was working on started to crash. The component in question is the ceph-mgr daemon from the Ceph project [2].
Python subinterpreters are the basic building block for the plugin/module architecture of ceph-mgr. Each "manager module" runs in its own python subinterpreter. Furthermore, all python bindings for the client libraries of Ceph, such as librados, librbd, libcephfs, and librgw, are implemented as Cython modules, and in the particular case of librados, all ceph-mgr plugin modules import the rados Cython module upon initialization.
In practice, with Cython 0.29 we can only load one module, because the following modules will refuse to load.
After discovering this issue, we "temporarily" prevent the issue by restricting the version of Cython as a dependency [3]. But we don't want to keep this restriction indefinitely and would prefer a fix from the Cython side.
Do you think it's feasible to implement a flag to disable the safe guard introduced in [1]? That way we could re-enable subinterpreters at our own risk.
[1] https://github.com/cython/cython/commit/7e27c7cd51a2f048cd6d3c246740cd977f8d... [2] https://github.com/ceph/ceph [3] https://github.com/ceph/ceph/pull/25328
My guess is that your modules just silently leaked object references and memory with the previous Cython versions. That is why we now inserted a guard that detects cases where the module init function is executed multiple times, which would overwrite the state of the previous run. The shared library of an extension module is only loaded once, so any global C state is shared for the entire process, regardless of how often CPython calls the module init function.
I assume that the problem with subinterpreters occurs when a cython module declares some static/global variables, which might cause undesirable side-effects upon module loading in several subinterpreters.
I believe the cython modules that we develop in Ceph do not declared any global state, and therefore the modules have been working good when loaded by several subinterpreters.
This question already came up recently in the cython-users mailing list (where I'd say it belongs). Since I couldn't find a web version of my reply anywhere outside of google-groups, I'll copy my reply below: """
We use a lot of scratch interpreters to keep independent tasks isolated from one another which is when I ran into the error.
In theory, PEP-489 would allow this. https://www.python.org/dev/peps/pep-0489/ In practice, it's not that easy, because avoiding global state requires a lot of work and makes some things slower, especially access to module globals. It also cannot be done in normal C in all cases, because global cdef functions simply do not have access to non-static module globals, since you cannot pass an additional context into them without changing their signature. Imagine a (statically defined global) C callback function that tries to do a type check against a (module/runtime instance specific) extension type. Which is the right type to check against in that case? Depending on how such a function gets called, there might not even be a thread-local to recover its global module context from. These things could still be done by generating module specific C functions at runtime, but then you're really leaving the platform independent sector of C. These problems are not specific to Cython. Most CPython extension modules are not prepared to work with multiple interpreters, mostly for the same reasons. These issues can be worked around in some cases by carefully crafting the global state in a way that allows it to be shared across interpreters, but this is such a special case that Cython rather assumes that the module init code is not safe to be re-executed. And the code that Cython generates internally is also far from PEP-489 clean. I started doing some work towards getting rid of globals here: https://github.com/cython/cython/pull/1919 And, what a surprise, it's not easy. It turned out that PEP-489 isn't really enough here, so PEP-573 was written to investigate the details and resolve the remaining issues. https://www.python.org/dev/peps/pep-0573/ Very long story short: Cython detects reloading now and prevents it, rather than crashing in arbitrary places or leaking resources. The situation will probably improve over time (and help is always appreciated), but that's how things are now. """ Stefan
participants (2)
-
Ricardo Dias -
Stefan Behnel