[Python-Dev] Where the speed is lost! (was: 1.6 speed)

Christian Tismer tismer@trixie.triqs.com
Mon, 24 Apr 2000 16:01:26 +0200


> Christian Tismer wrote:
> >
> > "A.M. Kuchling" wrote:
> > >
> > > Python 1.6a2 is around 10% slower than 1.5 on pystone.
> > > Any idea why?
...
> > Stackless 1.5.2+ is 10 percent faster than Stackless 1.6a2.
> >
> > Claim:
> > This is not related to ceval.c .
> > Something else must have introduced a significant speed loss.

I guess I can explain now what's happening, at least
for the Windows platform.
Python 1.5.2's .dll was nearly about 512K, something more.
I think to remember that 512K is a common size of the secondary
cache.
Now, linking with the MS linker does not give you any
particularly useful order of modules. When I look into
the map file, the modules appear sorted by name.
This is for sure not providing optimum performance.
As I read the docs, explicit ordering of the linkage
would only make sense for C++ and wouldn't work out
for C, since we could order the exported functions, but
not the private ones, giving even more distance between
releated code.

My solution to see if I might be right was this:
I ripped out almost all builtin extension modules
and compiled/linked without them. This shrunk
the dll size down from 647K to 557K, very close
to the 1.5.2 size.
Now I get the following figures:

Python 1.6, with stackless patches:

D:\python\spc\Python-slp\PCbuild>python /python/lib/test/pystone.py
Pystone(1.1) time for 10000 passes = 1.95468
This machine benchmarks at 5115.92 pystones/second

Python 1.6, from the dist:

D:\Python16>python /python/lib/test/pystone.py
Pystone(1.1) time for 10000 passes = 2.09214
This machine benchmarks at 4779.8 pystones/second

That means my optimizations are in charge again,
after the overall code size went below about 512K.

I think these 10 percent are quite valuable.
These options come to my mind:

a) try to do optimum code ordering in the too large .dll .
   This seems to be hard to achieve.
b) Split the dll into two dll's in a way that all the
   necessary internal stuff sits closely in one of them.
c) try to split the library like above, but use
   a static library layout for one of them, and link the
   static library into the final dll. This would hopefully
   keep related things together.

I don't know if c) is possible, but it might be tried.

Any thoughts?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaunstr. 26                  :    *Starship* http://starship.python.net
14163 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     where do you want to jump today?   http://www.stackless.com