[Python-Dev] Avoiding CPython performance regressions

Tue Dec 1 05:14:40 EST 2015

On Tue, Dec 1, 2015 at 11:49 AM, Fabio Zadrozny <fabiofz at gmail.com> wrote:
>
> On Tue, Dec 1, 2015 at 6:36 AM, Maciej Fijalkowski <fijall at gmail.com> wrote:
>>
>> Hi
>>
>> Thanks for doing the work! I'm on of the pypy devs and I'm very
>> interested in seeing this getting somewhere. I must say I struggle to
>> read the graph - is red good or is red bad for example?
>>
>> I'm keen to help you getting anything you want to run it repeatedly.
>>
>> PS. The intel stuff runs one benchmark in a very questionable manner,
>> so let's maybe not rely on it too much.
>
>
> Hi Maciej,
>
> Great, it'd be awesome having data on multiple Python VMs (my latest target
> is really having a way to compare across multiple VMs/versions easily and
> help each implementation keep a focus on performance). Ideally, a single,
> dedicated machine could be used just to run the benchmarks from multiple VMs
> (one less variable to take into account for comparisons later on, as I'm not
> sure it'd be reliable to normalize benchmark data from different machines --
> it seems Zach was the one to contact from that, but if there's such a
> machine already being used to run PyPy, maybe it could be extended to run
> other VMs too?).
>
> As for the graph, it should be easy to customize (and I'm open to
> suggestions). In the case, as it is, red is slower and blue is faster (so,
> for instance in
> https://www.speedtin.com/reports/1_CPython27x_Performance_Over_Time,  the
> fastest CPython version overall was 2.7.3 -- and 2.7.1 was the baseline).
> I've updated the comments to make it clearer (and changed the second graph
> to compare the latest against the fastest version (2.7.rc11 vs 2.7.3) for
> the individual benchmarks.
>
> Best Regards,
>
> Fabio

There is definitely a machine available. I suggest you ask
python-infra list for access. It definitely can be used to run more
than just pypy stuff. As for normalizing across multiple machines -
don't even bother. Different architectures make A LOT of difference,
especially with cache sizes and whatnot, that seems to have different
impact on different loads.

As for graph - I like the split on the benchmarks and a better
description (higher is better) would be good.

I have a lot of ideas about visualizations, pop in on IRC, I'm happy
to discuss :-)

Cheers,
fijal