btw, for python memory usage on linux
/proc/PID/status
Here is some code for linux...
wget http://rene.f0o.com/~rene/stuff/memory_usage.py
>>> import memory_usage
>>> bytes_of_resident_memory = memory_usage.resident()
Should be easy enough to add that to benchmarks at the start and end? Maybe calling it in the middle would be a little harder... but not too hard.
TODO: Would need to be updated for other platforms, and support measuring child processes, tests, and code cleanup :)
cu,
Hey.
I'll answer questions that are relevant to benchmarks themselves and
not running.
That's true. In general a benchmark run over time is a period of
On Wed, Mar 10, 2010 at 4:45 PM, Bengt Richter <bokr@oz.net> wrote:
> On 03/10/2010 12:14 PM Miquel Torres wrote:
>> Hi!
>>
>> I wanted to explain a couple of things about the speed website:
>>
>> - New feature: the Timeline view now defaults to a plot grid, showing
>> all benchmarks at the same time. It was a feature request made more
>> than once, so depending on personal tastes, you can bookmark either
>> /overview/ or /timeline/. Thanks go to nsf for helping with the
>> implementation.
>> - The code has now moved to github as Codespeed, a benchmark
>> visualization framework (http://github.com/tobami/codespeed)
>> - I have updated speed.pypy.org with version 0.3. Much of the work has
>> been under the hood to make it feasible for other projects to use
>> codespeed as a framework.
>>
>> For those interested in further development you can go to the releases
>> wiki (still a work in progress):
>> http://wiki.github.com/tobami/codespeed/releases
>>
>> Next in the line are some DB changes to be able to save standard
>> deviation data and the like. Long term goals besides world domination
>> are integration with buildbot and similarly unrealistic things.
>> Feedback is always welcome.
>
> Nice looking stuff. But a couple comments:
>
> 1. IMO standard deviation is too often worse than useless, since it hides
> the true nature of the distribution. I think the assumption of normalcy
> is highly suspect for benchmark timings, and pruning may hide interesting clusters.
>
> I prefer to look at scattergrams, where things like clustering and correlations
> are easily apparent to the eye, as well as the amount of data (assuming a good
> mapping of density to visuals).
warmup, when JIT compiles assembler followed by thing that can be
described by average and std devation. Personally I would like to have
those 3 measures separated, but didn't implement that yet (it has also
some interesting statistical questions involved). Std deviation is
useful to get whether a difference was meaningful of certain checkin
or just noise.
I can't agree more with that. We already do split time when we perform
>
> 2. IMO benchmark timings are like travel times, comparing different vehicles.
> (pypy with jit being a vehicle capable of dynamic self-modification ;-)
> E.g., which part of travel from Stockholm to Paris would you concentrate
> on improving to improve the overall result? How about travel from Brussels to Paris?
> Or Paris to Sydney? ;-P Different things come into play in different benchmarks/trips.
> A Porsche Turbo and a 2CV will both have to wait for a ferry, if that's part of the trip.
>
> IOW, it would be nice to see total time broken down somehow, to see what's really
> happening.
benchmarks by hand, but they're not yet integrated into the whole
nightly run. Total time is what users see though, that's why our
public site is focused on that. I want more information available, but
we have only limited amount of manpower and miquel already did quite
amazing job in my opinion :-) We'll probably go into more details.
The part we want to focus on past-release is speeding up certain parts
of tracing as well as limiting it's GC pressure. As you can see, the
split would be very useful for our development.
That's correct.
>
> Don't get me wrong, the total times are certainly useful indicators of progress
> (which has been amazing).
>
> 3. Speed is ds/dt and you are showing the integral of dt/ds over the trip distance to get time.
> A 25% improvement in total time is not a 25% improvement in speed. I.e., (if you define
> improvement as a percentage change in a desired direction), for e.g. 25%:
> distance/(0.75*time) != 1.25*(distance/time).
>
> IMO 'speed' (the implication to me in the name speed.pypy.org) would be benchmarks/time
> more appropriately than time/benchmark.
>
> Both measures are useful, but time percentages are easy to mis{use,construe} ;-)
Benchmarks are in general very easy to lie about and they're by
definition flawed. That's why I always include raw data when I publish
stuff on the blog, so people can work on it themselves.
No. Memory measurment is hard and it's even less useful without
>
> 4. Is there any memory footprint data?
>
breaking down. Those particular benchmarks are not very good basis for
memory measurment - in case of pypy you would mostly observe the
default allocated memory (which is roughly 10M for the interpreter +
16M for semispace GC + cache for nursery).
Also our GC is of a kind that it can run faster if you give it more
memory (not that we use this feature, but it's possible).
Cheers,
fijal
_______________________________________________
pypy-dev@codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev