On Tue, Dec 1, 2015 at 8:14 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:

On Tue, Dec 1, 2015 at 11:49 AM, Fabio Zadrozny <fabiofz@gmail.com> wrote:
>
> On Tue, Dec 1, 2015 at 6:36 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
>>
>> Hi
>>
>> Thanks for doing the work! I'm on of the pypy devs and I'm very
>> interested in seeing this getting somewhere. I must say I struggle to
>> read the graph - is red good or is red bad for example?
>>
>> I'm keen to help you getting anything you want to run it repeatedly.
>>
>> PS. The intel stuff runs one benchmark in a very questionable manner,
>> so let's maybe not rely on it too much.
>
>
> Hi Maciej,
>
> Great, it'd be awesome having data on multiple Python VMs (my latest target
> is really having a way to compare across multiple VMs/versions easily and
> help each implementation keep a focus on performance). Ideally, a single,
> dedicated machine could be used just to run the benchmarks from multiple VMs
> (one less variable to take into account for comparisons later on, as I'm not
> sure it'd be reliable to normalize benchmark data from different machines --
> it seems Zach was the one to contact from that, but if there's such a
> machine already being used to run PyPy, maybe it could be extended to run
> other VMs too?).
>
> As for the graph, it should be easy to customize (and I'm open to
> suggestions). In the case, as it is, red is slower and blue is faster (so,
> for instance in
> https://www.speedtin.com/reports/1_CPython27x_Performance_Over_Time, the
> fastest CPython version overall was 2.7.3 -- and 2.7.1 was the baseline).
> I've updated the comments to make it clearer (and changed the second graph
> to compare the latest against the fastest version (2.7.rc11 vs 2.7.3) for
> the individual benchmarks.
>
> Best Regards,
>
> Fabio

There is definitely a machine available. I suggest you ask
python-infra list for access. It definitely can be used to run more
than just pypy stuff. As for normalizing across multiple machines -
don't even bother. Different architectures make A LOT of difference,
especially with cache sizes and whatnot, that seems to have different
impact on different loads.

As for graph - I like the split on the benchmarks and a better
description (higher is better) would be good.

I have a lot of ideas about visualizations, pop in on IRC, I'm happy
to discuss :-)

Ok, I mailed infrastructure(at)python.org to see how to make it work.

I did add a legend now, so, it should be much easier to read already ;)

As for ideas on visualizations, I definitely want to hear about suggestions on how to improve it, although I'll start focusing on having the servers to get benchmark data running and will move on to improving the graphs right afterwards.

Cheers,

Fabio

Cheers,
fijal