[Python-Dev] Optimization targets

Thu Apr 15 09:36:13 EDT 2004

Hi,

mwh wrote:
> > (x_divmod is the hog, not l_divmod).
> 
> Probably a fine candidate function for rewriting in assembly too...

As a data point: I once had the doubtful pleasure to write a long-integer
library for cryptography. Hand-crafted x86 assembler outperforms plain
(but carefully optimized) C code by a factor of 2 to 3.

But Python's long-int code is a lot slower than e.g. gmp (factor 15-25
for mul/div, factor 100 for modular exponentiation).

I assume the difference between C and assembler is less pronounced with
other processors.

The register pressure issue may soon be a moot point with x86-64, though.
It has been shown that 64 bit pointers slow things down a bit, but compilers
just love the extra registers (R8-R15).

> > But GCC has more to offer: read the man page entries for -fprofile-arcs
> > and -fbranch-probabilities. Here is a short recipe:
> 
> I tried this on the ibook and I found that it made a small difference
> *on the program you ran to generate the profile data* (e.g. pystone),
> but made naff all difference for something else.  I can well believe
> that it makes more difference on a P4 or G5.

For x86 even profiling python -c 'pass' makes a major difference.
And the speed-ups are applicable to almost any program, since the
branch predictions for eval_frame and lookdict_string affect all
Python programs.

I'm currently engaged in a private e-mail conversation with Raymond
on how to convince GCC to generate good code on x86 without the help
of profiling.

> I wrote a rant about improving Python's performance, which I've
> finally got around to uploading:
> 
>     http://starship.python.net/crew/mwh/hacks/speeding-python.html
> 
> Tell me what you think!

About GC: yes, refcounting is the silent killer. But there's a lot to
optimize even without discarding refcounting. E.g. the code generated
for Py_DECREF is awful (spread across >3500 locations) and
PyObject_GC_UnTrack needs some work, too.

About psyco: I think it's wonderful. But, you are right, nobody is using it.
Why? Simple: It's not 'on' by default.

About type inference: maybe the right way to go with Python is lazy and
pure runtime type inference? Close to what psyco does.

About type declarations: this is too contrary to the Pythonic way of
thinking. And before we start to implement this, we should make sure
that it's a lot faster than a pure dynamic type inferencing approach.

About PyPy: very interesting, will take a closer look. But there's still
a long road ahead ...

Bye,
     Mike