[Cython] Python subinterpreters support problem in v0.29

Stefan Behnel stefan_ml at behnel.de
Wed Dec 12 13:37:49 EST 2018


Ricardo Dias schrieb am 11.12.18 um 23:16:
> On 11/12/18 19:39, Stefan Behnel wrote:
>> Ricardo Dias schrieb am 10.12.18 um 14:42:
>>> In the recent Cython 0.29 version was introduced a commit [1] that
>>> hinders the usage of python subinterpreters.
>>>
>>> I discovered this the hard way when suddenly a component I was working
>>> on started to crash. The component in question is the ceph-mgr daemon
>>> from the Ceph project [2].
>>>
>>> Python subinterpreters are the basic building block for the
>>> plugin/module architecture of ceph-mgr. Each "manager module" runs in
>>> its own python subinterpreter. Furthermore, all python bindings for the
>>> client libraries of Ceph, such as librados, librbd, libcephfs, and
>>> librgw, are implemented as Cython modules, and in the particular case of
>>> librados, all ceph-mgr plugin modules import the rados Cython module
>>> upon initialization.
>>>
>>> In practice, with Cython 0.29 we can only load one module, because the
>>> following modules will refuse to load.
>>>
>>> After discovering this issue, we "temporarily" prevent the issue by
>>> restricting the version of Cython as a dependency [3]. But we don't want
>>> to keep this restriction indefinitely and would prefer a fix from the
>>> Cython side.
>>>
>>> Do you think it's feasible to implement a flag to disable the safe guard
>>> introduced in [1]? That way we could re-enable subinterpreters at our
>>> own risk.
>>>
>>> [1]
>>> https://github.com/cython/cython/commit/7e27c7cd51a2f048cd6d3c246740cd977f8d2e50
>>> [2] https://github.com/ceph/ceph
>>> [3] https://github.com/ceph/ceph/pull/25328
>>
>> My guess is that your modules just silently leaked object references and
>> memory with the previous Cython versions. That is why we now inserted a
>> guard that detects cases where the module init function is executed
>> multiple times, which would overwrite the state of the previous run. The
>> shared library of an extension module is only loaded once, so any global C
>> state is shared for the entire process, regardless of how often CPython
>> calls the module init function.
> 
> I assume that the problem with subinterpreters occurs when a cython
> module declares some static/global variables, which might cause
> undesirable side-effects upon module loading in several subinterpreters.
> 
> I believe the cython modules that we develop in Ceph do not declared any
> global state, and therefore the modules have been working good when
> loaded by several subinterpreters.

This question already came up recently in the cython-users mailing list
(where I'd say it belongs). Since I couldn't find a web version of my reply
anywhere outside of google-groups, I'll copy my reply below:

"""
> We use a lot of scratch
> interpreters to keep independent tasks isolated from one another which is
> when I ran into the error.

In theory, PEP-489 would allow this.

https://www.python.org/dev/peps/pep-0489/

In practice, it's not that easy, because avoiding global state requires a
lot of work and makes some things slower, especially access to module
globals. It also cannot be done in normal C in all cases, because global
cdef functions simply do not have access to non-static module globals,
since you cannot pass an additional context into them without changing
their signature. Imagine a (statically defined global) C callback function
that tries to do a type check against a (module/runtime instance specific)
extension type. Which is the right type to check against in that case?
Depending on how such a function gets called, there might not even be a
thread-local to recover its global module context from.

These things could still be done by generating module specific C functions
at runtime, but then you're really leaving the platform independent sector
of C.

These problems are not specific to Cython. Most CPython extension modules
are not prepared to work with multiple interpreters, mostly for the same
reasons. These issues can be worked around in some cases by carefully
crafting the global state in a way that allows it to be shared across
interpreters, but this is such a special case that Cython rather assumes
that the module init code is not safe to be re-executed.

And the code that Cython generates internally is also far from PEP-489
clean. I started doing some work towards getting rid of globals here:

https://github.com/cython/cython/pull/1919

And, what a surprise, it's not easy. It turned out that PEP-489 isn't
really enough here, so PEP-573 was written to investigate the details and
resolve the remaining issues.

https://www.python.org/dev/peps/pep-0573/

Very long story short: Cython detects reloading now and prevents it, rather
than crashing in arbitrary places or leaking resources. The situation will
probably improve over time (and help is always appreciated), but that's how
things are now.
"""

Stefan


More information about the cython-devel mailing list