I'm still working on analyzing past optimizations to guide future optimizations. I succeeded to identify multiple significant optimizations over the last 3 years. At least for me, some were unexpected like "Use the test suite for profile data" which made pidigts 1.16x faster.
Here is a report of my work of last weeks.
I succeeded to compute benchmarks on CPython master on the period April, 2014-April,2017: we now have have a timeline over 3 years of CPython performance!
I started to take notes on significant performance changes (speedup and slowdown) of this timeline:
To identify the change which introduced a significant performance change, I wrote a Python script running a Git bisection: compile CPython, run benchmark, repeat.
It uses a configuration file which looks like:
[config] work_dir = ~/prog/bench_python/bisect-pickle src_dir = ~/prog/bench_python/master
old_commit = 133138a284be1985ebd9ec9014f1306b9a42 new_commit = 10427f44852b6e872034061421a8890902b8f benchmark = ~/prog/bench_python/performance/performance/benchmarks/bm_pickle.py pickle
benchmark_opts = --inherit-environ=PYTHONPATH -p5 -v configure_args =
I succeeded to identify many significant optimizations (TODO: validate them on the speed-python server), examples:
- PyMem_Malloc() now uses the fast pymalloc allocator
- Add a C implementation of collections.OrderedDict
- Use the test suite for profile data
- Speedup method calls 1.2x
- Added C implementation of functools.lru_cache()
- Optimized ElementTree.iterparse(); it is now 2x faster
perf, performance, server configuration, etc. evolve quicker than expected, so I created a Git project to keep a copy of JSON files:
I already lost data of my first miletone (november-december 2016), but you have data from the second (december 2016-february 2017) and third (march 2016-today) milestones.
I'm now discussing with PyPy to see how performance could be used to measure PyPy performance.