[Speed] Latest enhancements of perf 0.8.1 and performance 0.3.1

Wed Nov 2 11:53:45 EDT 2016

2016-11-02 15:20 GMT+01:00 Armin Rigo <armin.rigo at gmail.com>:
> Is that really the kind of examples you want to put forward?

I am not a big fan of timeit, but we must use it sometimes to
micro-optimizations in CPython to check if an optimize really makes
CPython faster or not. I am only trying to enhance timeit.
Understanding results require to understand how the statements are
executed.

> This example means "compare CPython where the data cache gets extra pressure from reading a strangely large code object,

I wrote --duplicate option to benchmark "x+y" with "x=1; y=2". I know,
it's an extreme and stupid benchmark, but many people spend a lot of
time on trying to optimize this in Python/ceval.c:
https://bugs.python.org/issue21955

I tried multiple values of --duplicate when benchmarking x+y, and x+y
seems "faster" when using a larger --duplicate value. I understand
that the cost of the outer loop is higher than the cost of "reading a
strangely large code object".

I provide a tool and I try to document how to use it. But it's hard to
prevent users to use it for stupid things.

For example, recently I spent time trying to optimize bytes%args in
Python 3 after reading an article, but then I realized that the Python
2 benchmark was meaningless:
https://atleastfornow.net/blog/not-all-bytes/

def bytes_plus():
    b"hi" + b" " + b"there"
... benchmark(bytes_plus) ...

bytes_plus() is optimized by the _compiler_, so the benchmark measure
the cost of LOAD_CONST :-)

The issue was not the tool but the usage of the tool :-D

Victor