[pypy-dev] New speed.pypy.org version

Maciej Fijalkowski fijall at gmail.com
Fri Jun 25 17:53:45 CEST 2010


Hey Paolo.

While in general I agree with you, this is not exactly science, I
still think it's giving somewhat an impression what's going on to
outsiders. Inside, we still look mostly at particular benchmarks. I'm
not sure having any convoluted (at least to normal people) metric
while summarizing would help, maybe. Speaking a bit on Miguel's
behalf, feel free to implement this as a feature on codespeed (it's an
open source project after all), you can fork it on github
http://github.com/tobami/codespeed.

Cheers,
fijal

On Fri, Jun 25, 2010 at 6:07 AM, Paolo Giarrusso <p.giarrusso at gmail.com> wrote:
> Hi!
> First, I want to restate the obvious, before pointing out what I think
> is a mistake: your work on this website is great and very useful!
>
> On Fri, Jun 25, 2010 at 13:08, Miquel Torres <tobami at googlemail.com> wrote:
>> - stacked bars
> Here you are summing up normalized times, which is more or less like
> taking their arithmetic average. And that doesn't work at all: in many
> cases you can "show" completely different results by normalizing
> relatively to another item. Even the simple question "who is faster?"
> can be answered in different ways
> So you should use the geometric mean, even if this is not so widely
> known. Or better, it is known by benchmarking experts, but it's
> difficult to become so.
>
> Please, have a look at the short paper:
> "How not to lie with statistics: the correct way to summarize benchmark results"
> http://scholar.google.com/scholar?cluster=1051144955483053492&hl=en&as_sdt=2000
> I downloaded it from the ACM library, please tell me if you can't find it.
>
>> horizontal(http://speed.pypy.org/comparison/?hor=true&bas=2%2B35&chart=stacked+bars):
>> This is not meant to "demonstrate" that overall the jit is over two times
>> faster than cpython. It is just another way for a developer to picture how
>> long a programme would take to complete if it were composed of 21 such
>> tasks.
>
> You are not summing up absolute times, so your claim is incorrect. And
> the error is significant, given the above paper.
> A sum of absolute times would provide what you claim.
>
>> You can see that cpython's (the normalization chosen) benchmarks all
>> take 1"relative" second.
> Here, for instance, I see that CPython and pypy-c take more or less
> the same time, which surprises me (since the PyPy interpreter was
> known to be slower than CPython). But given that the result is
> invalid, it may well be an artifact of your statistics.
>
>> pypy-c needs more or less the same time, some
>> "tasks" being slower and some faster. Psyco shows an interesting picture:
>> From meteor-contest downwards (fortuitously) , all benchmarks are extremely
>> "compressed", which means they are speeded up by psyco quite a lot. But any
>> further speed up wouldn't make overall time much shorter because the first
>> group of benchmarks now takes most of the time to complete. pypy-c-jit is a
>> more extreme case of this: If the jit accelerated all "fast" benchmarks to 0
>> seconds (infinitely fast), it would only get about twice as fast as now
>> because ai, slowspitfire, spambayes and twisted_tcp now need half the entire
>> execution time. An good demonstration of "you are only as fast as your
>> slowest part". Of course the aggregate of all benchmarks is not a real app,
>> but it is still fun.
>
> This could maybe be still true, at least in part, but you have to do
> this reasoning on absolute times.
>
> Best regards, and keep up the good work!
> --
> Paolo Giarrusso - Ph.D. Student
> http://www.informatik.uni-marburg.de/~pgiarrusso/
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>



More information about the Pypy-dev mailing list