[Cython] Shared Cython runtime

Stefan Behnel stefan_ml at behnel.de
Wed Apr 10 07:24:33 CEST 2013


Nikita Nemkin, 10.04.2013 06:22:
> Robert Bradshaw, 09.04.2013 20:16:
>>>> Yep. We could even create stub .so files for the included ones that
>>>> did nothing but import the big one to avoid having to worry about
>>>> importing things in the right order.
> 
> The stubs don't have to be shared libraries, pure python would work
> just fine (and reduce the complexity).

Sure. That's actually the most common way to do it. Most C modules in
CPython's stdlib have a Python wrapper script, for example. It's also not
uncommon to add a certain amount of functionality in that script. That
makes even just "having it there" a feature for future development.


>>>> The big question is what API to specify that things should be linked
>>>> together. An option to cythonize(...) could be used, but for some
>>>> projects (e.g. Sage) one would want more control over which modules
>>>> get bundled together rather than all-or-nothing.
>>>
>>> No-one forces you to use a plain cythonize("**/*.pyx"). Just be more
>>> specific about what files to include in each pattern that you pass to in.
>>> And you can always call cythonize() more than once if you need to. Once for
>>> each meta-module should usually be acceptable, given that the bundled
>>> source modules would tend to have a common configuration anyway.
>>
>> That would still be painful. Ideally, users should never have to
>> modify the setup.py file.
> 
> Users have to plan for what to bundle, what the package structure
> will be, etc. It is not an "enable and forget" type of thing.
> (Unless you have an all-Cython package tree.)

Right. And even in that case, you'd want to make sure that importing one
little module doesn't instantiate loads of useless other modules. That
would just kill your startup time and waste memory. So this is really about
careful layout and planning.

Maybe compiling whole packages isn't all that a good idea after all, unless
at least most of the modules in that package are actually required for the
core functionality.


> I prefer explicitly creating "bundle" extensions with
> distutil.core.Extension, passing multiple pyx files as sources,
> then passing the result to cythonize.

And I don't see why we should not require users to do it exactly like this.
As you said, it's an explicit thing anyway, so making it explicit in the
setup.py script seems like a good way to express it to me.

Sage as a huge project might be an exception, but I'm sure there are not
many projects of similar size out there. And even in Sage, ISTM that a
single point of explicit packaging configuration would be better than
implicit meta-package building inside of cythonize. In the worst case, it
can always be externalised into some kind of .ini or JSON file and loaded
from there in a generic way.


> If you really want to push this into cythonize (it already does more
> than it should IMO), one option is to add a new comment directive,
> for example:
> # cython: bundle = mymodule._speedups
> similar to how distutils options are passed.
> All .pyx files with the same "bundle" value will be put in that bundle.

I don't consider it a source level thing. It's a build time packaging
thing, so it should be a build time decision, which makes setup.py the
right place to express it.


>>>>> There may be glitches with stuff like "__file__" and friends, but that
>>>>> should not be any worse than the current way of workings.
>>>>
>>>> This can probably be manipulated by Cython, though it's unclear what
>>>> the best value would be.
> 
> The best value is the shared library pathname. All extension modules
> have it like that. The fact that multiple modules share the same .so
> is (luckily) irrelevant to the Python import system.

The problem is not which value to use. The problem is that the shared
library pathname is not known at init time, except inside of the loader,
and it doesn't tell us about it. In the specific case of a meta-module, it
won't even be set before *all* init functions of the submodules were called
and the main init function terminates.

Which is a really bad situation, because it means that even though the main
module will eventually have its __file__ set by the loader, it has no way
to propagate it to the submodules anymore. So they won't have their
__file__ property set at all.

Just in case someone missed it, here's the bug URL once again:

http://bugs.python.org/issue13429

Cython can fix up stuff related to the FQMN and sys.modules (and it does
that already), but can't do anything about the file path. That makes things
like resource loading rather annoying.

BTW, here's also the python-dev discussion on the topic:

http://thread.gmane.org/gmane.comp.python.devel/135764

Maybe that discussion should be revived and this use case added to it.

Stefan



More information about the cython-devel mailing list