[Python-Dev] Python 3.3 vs. Python 2.7 benchmark results (again, but this time more solid numbers)

Stefan Behnel stefan_ml at behnel.de
Sun Oct 28 08:22:07 CET 2012


Tim Delaney, 27.10.2012 22:53:
> On 28 October 2012 07:40, Mark Shannon wrote:
>> I suspect that stating and loading the .pyc files is responsible for most
>> of the overhead.
>> PyRun starts up quite a lot faster thanks to embedding all the modules in
>> the executable: http://www.egenix.com/**products/python/PyRun/<http://www.egenix.com/products/python/PyRun/>
>>
>> Freezing all the core modules into the executable should reduce start up
>> time.
>
> That suggests a test to me that the Cython guys might be interested in (or
> may well have performed in the past). How much of the stdlib could be
> compiled with Cython and used during the startup process?

We have a Jenkins job set up to run the CPython test suite with a compiled 
stdlib:

https://sage.math.washington.edu:8091/hudson/job/cython-devel-tests-pyregr-stdlib/

Basically, we use pyximport as an import hook that tries to compile Python 
modules on import and then imports the shared library if it worked or the 
original Python module if it failed. A solution that explicitly runs over 
the stdlib and compiles it would be substantially cleaner and more stable.

I don't have numbers for Py3.4 because we currently have a hard crash in 
one of the tests on that platform when compiling recursively on import 
(likely meaning that one of the stdlib modules and/or tests would have to 
be excluded from compilation), but I get 434 automatically compiled stdlib 
modules for the latest Py2.7 branch out of 744 (excluding the test suite). 
And Py3.x code tends to pass as least as well through the compiler, often 
better.

Note that quite a number of modules are excluded accidentally because they 
are already imported as Python modules when Cython starts working. 
Compiling them explicitly would remove that limitation, maybe adding 
another (wild guess) 50 modules or so. Another few are not being compiled 
because the test module that uses them fails to compile. So missing shared 
libraries are not always due to failures to compile that particular Python 
module.

I didn't pay much attention to this part of our integration tests so far - 
a bit of debugging should get the Py3.4 build working.


> How much of an
> effect would it have on startup times and these benchmarks if
> Cython-compiled extensions were used?

Depends on what and how much code you use. If you compile everything into 
one big module that "imports" all of the stdlib when it gets loaded, you'd 
likely loose a lot of time because it would take a while to initialise all 
that useless code on startup. If you keep it separate, it would likely be a 
lot faster because you avoid the interpreter for most of the module startup.

Most Python code runs about 30% faster when compiled, some faster, some 
slower. If you want better numbers, you can start optimising the code by 
giving Cython static type hints. I did that for difflib a while ago, for 
example. Changing two methods made it some 50% faster back then:

http://blog.behnel.de/index.php?p=155

That particular module should compile without changes these days, and you 
can provide the type hints externally, i.e. without modifying the Python 
code itself.


> I'm thinking here of elimination of .pyc interpretation and execution (stat
> calls would be similar, probably slightly higher).

CPython checks for .so files before looking for .py files and imports are 
absolute by default in Py3, so there should be a slight reduction in stat 
calls. The net result then obviously also depends on how fast your shared 
library loader and linker is, etc., but I doubt that that path is any 
slower than loading and running a .pyc file.

BTW, you'd still get nice stack traces for compiled modules as long as your 
.py files lie right next to your .so files.


> To be clear - I'm *not* suggesting Cython become part of the required build
> toolchain. But *if* the Cython-compiled extensions prove to be
> significantly faster I'm thinking maybe it could become a semi-supported
> option (e.g. a HOWTO with the caveat "it worked on this particular system").

Sounds reasonable.

Stefan




More information about the Python-Dev mailing list