[issue4715] optimize bytecode for conditional branches

Thu Feb 26 06:54:48 CET 2009

Jeffrey Yasskin <jyasskin at gmail.com> added the comment:

The numbers are:

Intel Core 2, gcc-4.3, 32-bit
2to3:
25.24 -> 24.89: 1.38% faster

Django:
Min: 0.618 -> 0.607: 1.90% faster
Avg: 0.621 -> 0.615: 1.04% faster

PyBench:
Min: 5324 -> 5280: 0.83% faster
Avg: 5456 -> 5386: 1.30% faster

Pickle:
Min: 1.424 -> 1.376: 3.48% faster
Avg: 1.427 -> 1.378: 3.55% faster

Spitfire:
Min: 0.701 -> 0.718: 2.32% slower
Avg: 0.710 -> 0.721: 1.47% slower

Unpickle:
Min: 0.667 -> 0.651: 2.33% faster
Avg: 0.668 -> 0.652: 2.38% faster

Intel Core 2, gcc-4.3, 64-bit

2to3:
22.40 -> 22.59: 0.81% slower

Django:
Min: 0.575 -> 0.565: 1.74% faster
Avg: 0.577 -> 0.567: 1.76% faster

PyBench:
Min: 4332 -> 4433: 2.28% slower
Avg: 4393 -> 4519: 2.79% slower

Pickle:
Min: 1.177 -> 1.204: 2.25% slower
Avg: 1.180 -> 1.205: 2.14% slower

Spitfire:
Min: 0.622 -> 0.629: 1.22% slower
Avg: 0.623 -> 0.631: 1.26% slower

Unpickle:
Min: 0.576 -> 0.563: 2.25% faster
Avg: 0.596 -> 0.564: 5.55% faster

On my MacBook, gcc-4.0, 32-bit:
2to3:
29.82 -> 29.39: 1.46% faster

Django:
Min: 0.727 -> 0.720: 0.98% faster
Avg: 0.746 -> 0.736: 1.45% faster

PyBench:
Min: 6303 -> 6432: 2.01% slower
Avg: 6471 -> 6563: 1.40% slower

Pickle:
Min: 1.564 -> 1.564: 0.00% faster
Avg: 1.609 -> 1.592: 1.07% faster

Spitfire:
Min: 0.902 -> 0.909: 0.78% slower
Avg: 0.924 -> 0.920: 0.41% faster

Unpickle:
Min: 0.784 -> 0.763: 2.73% faster
Avg: 0.794 -> 0.776: 2.26% faster

The performance isn't as good as I'd like, especially on 64-bits. I
suspect the difference from the py3k branch is that trunk doesn't have
Antoine's dispatch patch, and POP_TOP is predicted after
JUMP_IF_{TRUE,FALSE}, which means without computed-goto-dispatch, this
patch usually only saves a predictable if(). The skipped JUMP_ABSOLUTEs
may not happen enough in my benchmarks to matter much.

On the other hand, "./python.exe -m timeit -s 'x=range(500)' '[y+3 for y
in x if y%5 <2]'" shows the following differences on my MacBook

For py3k:
Min: 196.000 -> 172.000: 13.95% faster
Avg: 200.000 -> 178.600: 11.98% faster
Significant (t=5.339997, a=0.95)

For trunk:
Min: 108.000 -> 88.200: 22.45% faster
Avg: 114.571 -> 97.571: 17.42% faster
Significant (t=5.518236, a=0.95)

That list comprehension definitely takes advantage of skipping the
JUMP_ABSOLUTE.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4715>
_______________________________________