Tim Delaney, 28.10.2012 20:48:
On 28 October 2012 18:22, Stefan Behnel wrote:
How much of an effect would it have on startup times and these benchmarks if Cython-compiled extensions were used?
Depends on what and how much code you use. If you compile everything into one big module that "imports" all of the stdlib when it gets loaded, you'd likely loose a lot of time because it would take a while to initialise all that useless code on startup. If you keep it separate, it would likely be a lot faster because you avoid the interpreter for most of the module startup.
I was specifically thinking in terms of the tests Brett ran (that was the full set on speed.python.org, wasn't it?), and having each stdlib module be its own extension i.e. no big import module. A literal 1:1 replacement where possible.
There's also an intermediate solution of linking the top-N modules into the interpreter core and leaving the rest outside, but I'd rather go for the straight forward approach of having separate libs first.
Compiling all that can be compiled is easy enough. I fixed up a couple of things in Cython (so you need the latest github master) and then ran this setup.py script from the Lib directory with "build_ext -i":
""" from distutils.core import setup from Cython.Build import cythonize from Cython.Compiler import Options
# improve Python compatibility by allowing some broken code Options.error_on_unknown_names = False
setup( name = 'stuff', ext_modules = cythonize( ["**/*.py"], exclude=['**/test/**/*.py', '**/tests/**/*.py', '**/__init__.py', 'idlelib/MultiCall.py'], exclude_failures=True, language_level=sys.version_info, compiler_directives=dict(auto_cpdef=True) ), ) """
Note that the extra compiler option above disables fatal compile errors on unknown (usually mistyped) names of which Cython hits a couple in the stdlib. pylint should find them as well, they're worth fixing.
The directive at the end enables automatic module internal C calls which usually gives a major speed-up by allowing the C compiler to see what happens.
With the above setup, Cython compiles 612 out of 620 Python modules for me, excluding test modules and __init__.py files. The rest fails to compile due to either compiler bugs or statically detected bugs in the Python code. I'll look through them when I find a bit of time.
One major problem I ran into is that the new importlib bootstrap module crashes with a RuntimeError("maximum recursion depth exceeded while calling a Python object)" when it hits compiled modules with import cycles (e.g. shutil and tarfile, or os and posixpath). I guess that's the kind of corner case you get when working code gets rewritten. Worth giving Py3.2 a try in comparison.
To be clear - I'm *not* suggesting Cython become part of the required build toolchain. But *if* the Cython-compiled extensions prove to be significantly faster I'm thinking maybe it could become a semi-supported option (e.g. a HOWTO with the caveat "it worked on this particular system").
I think a stdlib compile script
... see above ...
- pre-packaged hints for the 3.3 release
would likely help both 3.3 and Cython acceptance.
That would certainly be a cool feature. This can often be as easy as putting a .pxd file next to the .py file that overrides the declarations of functions and classes with static types.
Putting aside my development interest and looking at it purely from the PoV of a Python *user*, I'd really like to see Cython on speed.python.org eventually (in two modes - one without hints as a baseline and one with hints).
I think the above setup.py script, with appropriately adapted glob patterns, should do that trick well enough for now. Certainly better and simpler than my initial pyximport configuration. With the obvious caveat that it takes a bit longer to compile everything, not just the modules that are actually used. But that's only an install time issue.