<br><br><div class="gmail_quote">On Fri, Apr 3, 2009 at 11:27, Antoine Pitrou <span dir="ltr"><<a href="mailto:solipsis@pitrou.net">solipsis@pitrou.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="im">Thomas Wouters <thomas <at> <a href="http://python.org" target="_blank">python.org</a>> writes:<br>
><br>
><br>
> Pystone is pretty much a useless benchmark. If it measures anything, it's the<br>
speed of the bytecode dispatcher (and it doesn't measure it particularly well.)<br>
PyBench isn't any better, in my experience.<br>
<br>
</div>I don't think pybench is useless. It gives a lot of performance data about<br>
crucial internal operations of the interpreter. It is of course very little<br>
real-world, but conversely makes you know immediately where a performance<br>
regression has happened. (by contrast, if you witness a regression in a<br>
high-level benchmark, you still have a lot of investigation to do to find out<br>
where exactly something bad happened)</blockquote><div><br>Really? Have you tried it? I get at least 5% noise between runs without any changes. I have gotten results that include *negative* run times. And yes, I tried all the different settings for calibration runs and timing mechanisms. The tests in PyBench are not micro-benchmarks (they do way too much for that), they don't try to minimize overhead or noise, but they are also not representative of real-world code. That doesn't just mean "you can't infer the affected operation from the test name", but "you can't infer anything." You can just be looking at differently borrowed runtime. I have in the past written patches to Python that improved *every* micro-benchmark and *every* real-world measurement I made, except PyBench. Trying to pinpoint the slowdown invariably lead to tests that did too much in the measurement loop, introduced too much noise in the "calibration" run or just spent their time *in the measurement loop* on doing setup and teardown of the test. Collin and Jeffrey have seen the exact same thing since starting work on Unladen Swallow.<br>
<br>So, sure, it might be "useful" if you have 10% or more difference across the board, and if you don't have access to anything but pybench and pystone.<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Perhaps someone should start maintaining a suite of benchmarks, high-level and<br>
low-level; we currently have them all scattered around (pybench, pystone,<br>
stringbench, richard, iobench, and the various Unladen Swallow benchmarks; not<br>
to mention other third-party stuff that can be found in e.g. the Computer<br>
Language Shootout).</blockquote><div><br>That's exactly what Collin proposed at the summits last week. Have you seen <a href="http://code.google.com/p/unladen-swallow/wiki/Benchmarks">http://code.google.com/p/unladen-swallow/wiki/Benchmarks</a><br>
? Please feel free to suggest more benchmarks to add :)<br></div></div><br>-- <br>Thomas Wouters <<a href="mailto:thomas@python.org">thomas@python.org</a>><br><br>Hi! I'm a .signature virus! copy me into your .signature file to help me spread!<br>