Hi! I wanted to explain a couple of things about the speed website: - New feature: the Timeline view now defaults to a plot grid, showing all benchmarks at the same time. It was a feature request made more than once, so depending on personal tastes, you can bookmark either /overview/ or /timeline/. Thanks go to nsf for helping with the implementation. - The code has now moved to github as Codespeed, a benchmark visualization framework (http://github.com/tobami/codespeed) - I have updated speed.pypy.org with version 0.3. Much of the work has been under the hood to make it feasible for other projects to use codespeed as a framework. For those interested in further development you can go to the releases wiki (still a work in progress): http://wiki.github.com/tobami/codespeed/releases Next in the line are some DB changes to be able to save standard deviation data and the like. Long term goals besides world domination are integration with buildbot and similarly unrealistic things. Feedback is always welcome. Cheers, Miquel
On 03/10/2010 12:14 PM Miquel Torres wrote:
Hi!
I wanted to explain a couple of things about the speed website:
- New feature: the Timeline view now defaults to a plot grid, showing all benchmarks at the same time. It was a feature request made more than once, so depending on personal tastes, you can bookmark either /overview/ or /timeline/. Thanks go to nsf for helping with the implementation. - The code has now moved to github as Codespeed, a benchmark visualization framework (http://github.com/tobami/codespeed) - I have updated speed.pypy.org with version 0.3. Much of the work has been under the hood to make it feasible for other projects to use codespeed as a framework.
For those interested in further development you can go to the releases wiki (still a work in progress): http://wiki.github.com/tobami/codespeed/releases
Next in the line are some DB changes to be able to save standard deviation data and the like. Long term goals besides world domination are integration with buildbot and similarly unrealistic things. Feedback is always welcome.
Nice looking stuff. But a couple comments: 1. IMO standard deviation is too often worse than useless, since it hides the true nature of the distribution. I think the assumption of normalcy is highly suspect for benchmark timings, and pruning may hide interesting clusters. I prefer to look at scattergrams, where things like clustering and correlations are easily apparent to the eye, as well as the amount of data (assuming a good mapping of density to visuals). 2. IMO benchmark timings are like travel times, comparing different vehicles. (pypy with jit being a vehicle capable of dynamic self-modification ;-) E.g., which part of travel from Stockholm to Paris would you concentrate on improving to improve the overall result? How about travel from Brussels to Paris? Or Paris to Sydney? ;-P Different things come into play in different benchmarks/trips. A Porsche Turbo and a 2CV will both have to wait for a ferry, if that's part of the trip. IOW, it would be nice to see total time broken down somehow, to see what's really happening. Don't get me wrong, the total times are certainly useful indicators of progress (which has been amazing). 3. Speed is ds/dt and you are showing the integral of dt/ds over the trip distance to get time. A 25% improvement in total time is not a 25% improvement in speed. I.e., (if you define improvement as a percentage change in a desired direction), for e.g. 25%: distance/(0.75*time) != 1.25*(distance/time). IMO 'speed' (the implication to me in the name speed.pypy.org) would be benchmarks/time more appropriately than time/benchmark. Both measures are useful, but time percentages are easy to mis{use,construe} ;-) 4. Is there any memory footprint data? Regards, Bengt Richter
Hey. I'll answer questions that are relevant to benchmarks themselves and not running. On Wed, Mar 10, 2010 at 4:45 PM, Bengt Richter <bokr@oz.net> wrote:
On 03/10/2010 12:14 PM Miquel Torres wrote:
Hi!
I wanted to explain a couple of things about the speed website:
- New feature: the Timeline view now defaults to a plot grid, showing all benchmarks at the same time. It was a feature request made more than once, so depending on personal tastes, you can bookmark either /overview/ or /timeline/. Thanks go to nsf for helping with the implementation. - The code has now moved to github as Codespeed, a benchmark visualization framework (http://github.com/tobami/codespeed) - I have updated speed.pypy.org with version 0.3. Much of the work has been under the hood to make it feasible for other projects to use codespeed as a framework.
For those interested in further development you can go to the releases wiki (still a work in progress): http://wiki.github.com/tobami/codespeed/releases
Next in the line are some DB changes to be able to save standard deviation data and the like. Long term goals besides world domination are integration with buildbot and similarly unrealistic things. Feedback is always welcome.
Nice looking stuff. But a couple comments:
1. IMO standard deviation is too often worse than useless, since it hides the true nature of the distribution. I think the assumption of normalcy is highly suspect for benchmark timings, and pruning may hide interesting clusters.
I prefer to look at scattergrams, where things like clustering and correlations are easily apparent to the eye, as well as the amount of data (assuming a good mapping of density to visuals).
That's true. In general a benchmark run over time is a period of warmup, when JIT compiles assembler followed by thing that can be described by average and std devation. Personally I would like to have those 3 measures separated, but didn't implement that yet (it has also some interesting statistical questions involved). Std deviation is useful to get whether a difference was meaningful of certain checkin or just noise.
2. IMO benchmark timings are like travel times, comparing different vehicles. (pypy with jit being a vehicle capable of dynamic self-modification ;-) E.g., which part of travel from Stockholm to Paris would you concentrate on improving to improve the overall result? How about travel from Brussels to Paris? Or Paris to Sydney? ;-P Different things come into play in different benchmarks/trips. A Porsche Turbo and a 2CV will both have to wait for a ferry, if that's part of the trip.
IOW, it would be nice to see total time broken down somehow, to see what's really happening.
I can't agree more with that. We already do split time when we perform benchmarks by hand, but they're not yet integrated into the whole nightly run. Total time is what users see though, that's why our public site is focused on that. I want more information available, but we have only limited amount of manpower and miquel already did quite amazing job in my opinion :-) We'll probably go into more details. The part we want to focus on past-release is speeding up certain parts of tracing as well as limiting it's GC pressure. As you can see, the split would be very useful for our development.
Don't get me wrong, the total times are certainly useful indicators of progress (which has been amazing).
3. Speed is ds/dt and you are showing the integral of dt/ds over the trip distance to get time. A 25% improvement in total time is not a 25% improvement in speed. I.e., (if you define improvement as a percentage change in a desired direction), for e.g. 25%: distance/(0.75*time) != 1.25*(distance/time).
IMO 'speed' (the implication to me in the name speed.pypy.org) would be benchmarks/time more appropriately than time/benchmark.
Both measures are useful, but time percentages are easy to mis{use,construe} ;-)
That's correct. Benchmarks are in general very easy to lie about and they're by definition flawed. That's why I always include raw data when I publish stuff on the blog, so people can work on it themselves.
4. Is there any memory footprint data?
No. Memory measurment is hard and it's even less useful without breaking down. Those particular benchmarks are not very good basis for memory measurment - in case of pypy you would mostly observe the default allocated memory (which is roughly 10M for the interpreter + 16M for semispace GC + cache for nursery). Also our GC is of a kind that it can run faster if you give it more memory (not that we use this feature, but it's possible). Cheers, fijal
btw, for python memory usage on linux /proc/PID/status Here is some code for linux... wget http://rene.f0o.com/~rene/stuff/memory_usage.py
import memory_usage bytes_of_resident_memory = memory_usage.resident()
Should be easy enough to add that to benchmarks at the start and end? Maybe calling it in the middle would be a little harder... but not too hard. TODO: Would need to be updated for other platforms, and support measuring child processes, tests, and code cleanup :) cu, On Thu, Mar 11, 2010 at 12:32 AM, Maciej Fijalkowski <fijall@gmail.com>wrote:
Hey.
I'll answer questions that are relevant to benchmarks themselves and not running.
On Wed, Mar 10, 2010 at 4:45 PM, Bengt Richter <bokr@oz.net> wrote:
On 03/10/2010 12:14 PM Miquel Torres wrote:
Hi!
I wanted to explain a couple of things about the speed website:
- New feature: the Timeline view now defaults to a plot grid, showing all benchmarks at the same time. It was a feature request made more than once, so depending on personal tastes, you can bookmark either /overview/ or /timeline/. Thanks go to nsf for helping with the implementation. - The code has now moved to github as Codespeed, a benchmark visualization framework (http://github.com/tobami/codespeed) - I have updated speed.pypy.org with version 0.3. Much of the work has been under the hood to make it feasible for other projects to use codespeed as a framework.
For those interested in further development you can go to the releases wiki (still a work in progress): http://wiki.github.com/tobami/codespeed/releases
Next in the line are some DB changes to be able to save standard deviation data and the like. Long term goals besides world domination are integration with buildbot and similarly unrealistic things. Feedback is always welcome.
Nice looking stuff. But a couple comments:
1. IMO standard deviation is too often worse than useless, since it hides the true nature of the distribution. I think the assumption of normalcy is highly suspect for benchmark timings, and pruning may hide interesting clusters.
I prefer to look at scattergrams, where things like clustering and correlations are easily apparent to the eye, as well as the amount of data (assuming a good mapping of density to visuals).
That's true. In general a benchmark run over time is a period of warmup, when JIT compiles assembler followed by thing that can be described by average and std devation. Personally I would like to have those 3 measures separated, but didn't implement that yet (it has also some interesting statistical questions involved). Std deviation is useful to get whether a difference was meaningful of certain checkin or just noise.
2. IMO benchmark timings are like travel times, comparing different
(pypy with jit being a vehicle capable of dynamic self-modification ;-) E.g., which part of travel from Stockholm to Paris would you concentrate on improving to improve the overall result? How about travel from Brussels to Paris? Or Paris to Sydney? ;-P Different things come into play in different benchmarks/trips. A Porsche Turbo and a 2CV will both have to wait for a ferry, if
vehicles. that's part of the trip.
IOW, it would be nice to see total time broken down somehow, to see
what's really
happening.
I can't agree more with that. We already do split time when we perform benchmarks by hand, but they're not yet integrated into the whole nightly run. Total time is what users see though, that's why our public site is focused on that. I want more information available, but we have only limited amount of manpower and miquel already did quite amazing job in my opinion :-) We'll probably go into more details.
The part we want to focus on past-release is speeding up certain parts of tracing as well as limiting it's GC pressure. As you can see, the split would be very useful for our development.
Don't get me wrong, the total times are certainly useful indicators of
progress
(which has been amazing).
3. Speed is ds/dt and you are showing the integral of dt/ds over the trip distance to get time. A 25% improvement in total time is not a 25% improvement in speed. I.e., (if you define improvement as a percentage change in a desired direction), for e.g. 25%: distance/(0.75*time) != 1.25*(distance/time).
IMO 'speed' (the implication to me in the name speed.pypy.org) would be benchmarks/time more appropriately than time/benchmark.
Both measures are useful, but time percentages are easy to mis{use,construe} ;-)
That's correct.
Benchmarks are in general very easy to lie about and they're by definition flawed. That's why I always include raw data when I publish stuff on the blog, so people can work on it themselves.
4. Is there any memory footprint data?
No. Memory measurment is hard and it's even less useful without breaking down. Those particular benchmarks are not very good basis for memory measurment - in case of pypy you would mostly observe the default allocated memory (which is roughly 10M for the interpreter + 16M for semispace GC + cache for nursery).
Also our GC is of a kind that it can run faster if you give it more memory (not that we use this feature, but it's possible).
Cheers, fijal _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
So one tiny pony I have is that on the tablular timeline page (http://speed.pypy.org/timeline/) that when you mouseover a graph it doesn't show the "coordinates" on graphs of those sizes I don't think it adds any value, and it's farily distracting. Thanks for your work, Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Voltaire "The people's good is the highest law." -- Cicero "Code can always be simpler than you think, but never as simple as you want" -- Me
On Mar 12, 2010, at 6:08 PM, Alex Gaynor wrote:
So one tiny pony I have is that on the tablular timeline page (http://speed.pypy.org/timeline/) that when you mouseover a graph it doesn't show the "coordinates" on graphs of those sizes I don't think it adds any value, and it's farily distracting.
For a start it could be removed (that should be pretty easy) but as a second step it would be interesting to highlight and maybe show the revision or time of the closest point (if revision then highlight all points of that revision).
Thanks for your work, Alex -- Leonardo Santagada santagada at gmail.com
On Fri, Mar 12, 2010 at 06:37:49PM -0300, Leonardo Santagada wrote:
On Mar 12, 2010, at 6:08 PM, Alex Gaynor wrote:
So one tiny pony I have is that on the tablular timeline page (http://speed.pypy.org/timeline/) that when you mouseover a graph it doesn't show the "coordinates" on graphs of those sizes I don't think it adds any value, and it's farily distracting.
For a start it could be removed (that should be pretty easy)
Actually, if you added units to those numbers, they could answer important questions like "is a higher line better or is a lower line better?"
but as a second step it would be interesting to highlight and maybe show the revision or time of the closest point (if revision then highlight all points of that revision).
Some kind of rounding would be nice, as seeing "0.6 seconds in revision 71807.4" is a bit weird. Very shiny website, BTW, I love it. Marius Gedminas -- Q: How many IBM CPU's does it take to execute a job? A: Four; three to hold it down, and one to rip its head off.
On Mar 13, 2010, at 10:28 PM, Marius Gedminas wrote:
On Fri, Mar 12, 2010 at 06:37:49PM -0300, Leonardo Santagada wrote:
On Mar 12, 2010, at 6:08 PM, Alex Gaynor wrote:
So one tiny pony I have is that on the tablular timeline page (http://speed.pypy.org/timeline/) that when you mouseover a graph it doesn't show the "coordinates" on graphs of those sizes I don't think it adds any value, and it's farily distracting.
For a start it could be removed (that should be pretty easy)
Actually, if you added units to those numbers, they could answer important questions like "is a higher line better or is a lower line better?"
Yes but axis should be named in a always visible place, so when people see the graphs they know what they mean.
but as a second step it would be interesting to highlight and maybe show the revision or time of the closest point (if revision then highlight all points of that revision).
Some kind of rounding would be nice, as seeing "0.6 seconds in revision 71807.4" is a bit weird.
No rounding but actually showing the data for the closest point and not where the mouse is over.
Very shiny website, BTW, I love it.
I think this is a clear way to show performance for non developers and is great even for developers, it is a win win website :) -- Leonardo Santagada santagada at gmail.com
On Sun, Mar 14, 2010 at 07:11:56AM -0300, Leonardo Santagada wrote:
On Mar 13, 2010, at 10:28 PM, Marius Gedminas wrote:
Some kind of rounding would be nice, as seeing "0.6 seconds in revision 71807.4" is a bit weird.
No rounding but actually showing the data for the closest point and not where the mouse is over.
Sorry, yes, I meant rounding of coordinates to the nearest data point, not rounding the interpolated data to appear more truthy. Marius Gedminas -- UNIX is user friendly. It's just selective about who its friends are.
Hey, thanks for the feedback. - In the timeline grid, I agree showing coordinates is not useful. I have disabled that - The rounding: yes, it is pretty stupid to show "revision 71807.4". That goes away once coordinates are disabled though.
I think this is a clear way to show performance for non developers and is great even for developers, it is a win win website :)
I'm very glad to hear that. That was exactly my intention when starting the project :-D I had to scratch my itch of wanting to better follow pypy's performance as a common python developer, but I also recognized that being such a performance oriented project, pypy badly needed good performance regression monitoring and progress tracking. Cheers! Miquel 2010/3/14 Leonardo Santagada <santagada@gmail.com>
On Mar 13, 2010, at 10:28 PM, Marius Gedminas wrote:
On Fri, Mar 12, 2010 at 06:37:49PM -0300, Leonardo Santagada wrote:
On Mar 12, 2010, at 6:08 PM, Alex Gaynor wrote:
So one tiny pony I have is that on the tablular timeline page (http://speed.pypy.org/timeline/) that when you mouseover a graph it doesn't show the "coordinates" on graphs of those sizes I don't think it adds any value, and it's farily distracting.
For a start it could be removed (that should be pretty easy)
Actually, if you added units to those numbers, they could answer important questions like "is a higher line better or is a lower line better?"
Yes but axis should be named in a always visible place, so when people see the graphs they know what they mean.
but as a second step it would be interesting to highlight and maybe show the revision or time of the closest point (if revision then highlight all points of that revision).
Some kind of rounding would be nice, as seeing "0.6 seconds in revision 71807.4" is a bit weird.
No rounding but actually showing the data for the closest point and not where the mouse is over.
Very shiny website, BTW, I love it.
I think this is a clear way to show performance for non developers and is great even for developers, it is a win win website :)
-- Leonardo Santagada santagada at gmail.com
_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
- About showing axis labels, it can be done, of course, as they are for a single timeline plot. But please bear in mind that the timeline grid is a bit of a compromise UI-wise. It currently shows 20 plots (!) at the same time (probably more in the future if Maciej keeps adding more). Showing a "seconds" or "lower is better" in all of them would be a big waste of screen real state. You also have to assume that people who access this site are knowledgeable enough for being able to interpret plots correctly, specially if the single plot view does specify units and ALL yaxis are in seconds. It is not like they change to flops and then to mips and seconds again... 2010/3/14 Miquel Torres <tobami@googlemail.com>
Hey, thanks for the feedback.
- In the timeline grid, I agree showing coordinates is not useful. I have disabled that - The rounding: yes, it is pretty stupid to show "revision 71807.4". That goes away once coordinates are disabled though.
I think this is a clear way to show performance for non developers and is great even for developers, it is a win win website :)
I'm very glad to hear that. That was exactly my intention when starting the project :-D I had to scratch my itch of wanting to better follow pypy's performance as a common python developer, but I also recognized that being such a performance oriented project, pypy badly needed good performance regression monitoring and progress tracking.
Cheers! Miquel
2010/3/14 Leonardo Santagada <santagada@gmail.com>
On Mar 13, 2010, at 10:28 PM, Marius Gedminas wrote:
On Fri, Mar 12, 2010 at 06:37:49PM -0300, Leonardo Santagada wrote:
On Mar 12, 2010, at 6:08 PM, Alex Gaynor wrote:
So one tiny pony I have is that on the tablular timeline page (http://speed.pypy.org/timeline/) that when you mouseover a graph it doesn't show the "coordinates" on graphs of those sizes I don't think it adds any value, and it's farily distracting.
For a start it could be removed (that should be pretty easy)
Actually, if you added units to those numbers, they could answer important questions like "is a higher line better or is a lower line better?"
Yes but axis should be named in a always visible place, so when people see the graphs they know what they mean.
but as a second step it would be interesting to highlight and maybe show the revision or time of the closest point (if revision then highlight all points of that revision).
Some kind of rounding would be nice, as seeing "0.6 seconds in revision 71807.4" is a bit weird.
No rounding but actually showing the data for the closest point and not where the mouse is over.
Very shiny website, BTW, I love it.
I think this is a clear way to show performance for non developers and is great even for developers, it is a win win website :)
-- Leonardo Santagada santagada at gmail.com
_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
Hello. Please keep in mind that I'm using that on 12' monitor and that makes screen space even more valuable ;-) Cheers, fijal On Sun, Mar 14, 2010 at 11:06 AM, Miquel Torres <tobami@googlemail.com> wrote:
- About showing axis labels, it can be done, of course, as they are for a single timeline plot. But please bear in mind that the timeline grid is a bit of a compromise UI-wise. It currently shows 20 plots (!) at the same time (probably more in the future if Maciej keeps adding more). Showing a "seconds" or "lower is better" in all of them would be a big waste of screen real state. You also have to assume that people who access this site are knowledgeable enough for being able to interpret plots correctly, specially if the single plot view does specify units and ALL yaxis are in seconds. It is not like they change to flops and then to mips and seconds again...
2010/3/14 Miquel Torres <tobami@googlemail.com>
Hey, thanks for the feedback.
- In the timeline grid, I agree showing coordinates is not useful. I have disabled that - The rounding: yes, it is pretty stupid to show "revision 71807.4". That goes away once coordinates are disabled though.
I think this is a clear way to show performance for non developers and is great even for developers, it is a win win website :)
I'm very glad to hear that. That was exactly my intention when starting the project :-D I had to scratch my itch of wanting to better follow pypy's performance as a common python developer, but I also recognized that being such a performance oriented project, pypy badly needed good performance regression monitoring and progress tracking.
Cheers! Miquel
2010/3/14 Leonardo Santagada <santagada@gmail.com>
On Mar 13, 2010, at 10:28 PM, Marius Gedminas wrote:
On Fri, Mar 12, 2010 at 06:37:49PM -0300, Leonardo Santagada wrote:
On Mar 12, 2010, at 6:08 PM, Alex Gaynor wrote:
So one tiny pony I have is that on the tablular timeline page (http://speed.pypy.org/timeline/) that when you mouseover a graph it doesn't show the "coordinates" on graphs of those sizes I don't think it adds any value, and it's farily distracting.
For a start it could be removed (that should be pretty easy)
Actually, if you added units to those numbers, they could answer important questions like "is a higher line better or is a lower line better?"
Yes but axis should be named in a always visible place, so when people see the graphs they know what they mean.
but as a second step it would be interesting to highlight and maybe show the revision or time of the closest point (if revision then highlight all points of that revision).
Some kind of rounding would be nice, as seeing "0.6 seconds in revision 71807.4" is a bit weird.
No rounding but actually showing the data for the closest point and not where the mouse is over.
Very shiny website, BTW, I love it.
I think this is a clear way to show performance for non developers and is great even for developers, it is a win win website :)
-- Leonardo Santagada santagada at gmail.com
_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev
participants (7)
-
Alex Gaynor -
Bengt Richter -
Leonardo Santagada -
Maciej Fijalkowski -
Marius Gedminas -
Miquel Torres -
René Dudfield