[pypy-dev] MalGen as a benchmark?
Alex Gaynor
alex.gaynor at gmail.com
Sat Sep 29 01:39:05 CEST 2012
On Fri, Sep 28, 2012 at 4:36 PM, Chris Leary <cdleary at acm.org> wrote:
> Found a red-hot, branchy-looking Python kernel in the wild and
> naturally I thought of you trace compiler folks! ;-) Hope that it
> might be useful: I think it could make a nice addition to the speed
> center, seeing as how it's a CPU bound workload on all the machines I
> have access to (though I haven't profiled it at all so it could
> potentially be leaning heavily on paths in some unoptimized builtins).
>
> MalGen is a set of scripts which generate large, distributed data
> sets suitable for testing and benchmarking software designed to
> perform parallel processing on large data sets. The data sets can be
> thought of as site-entity log files. After an initial seeding, the
> scripts allow for the data generation to be initiated from a single
> central node to run the generation concurrently on multiple remote
> nodes of the cluster.
>
> -- http://code.google.com/p/malgen/
>
> Specifically,
> http://code.google.com/p/malgen/source/browse/trunk/bin/cloud/malgen/malgen.py
> which gets run thusly:
>
> ::
>
> pypy malgen.py -O /tmp/ -o INITIAL.txt 0 50000000 10000000 21
>
> (Where 5e7 is the "initial block size" and 1e7 is the
> other-than-inital block size.) This generates the initial seeding they
> were talking about, followed by a run for each of N blocks on each
> node (in this hypothetical setup, for 5 blocks on each of four nodes
> the following is run):
>
> ::
>
> pypy malgen.py -O /tmp [start_value]
>
> The metadata is read out of the INITIAL.txt file and used to determine
> the size of the block, and the parameter [start_value] is used to bump
> to the appropriate start id count for the current block.
>
> Inner loop:
> http://code.google.com/p/malgen/source/browse/trunk/bin/cloud/malgen/malgen.py#90
>
> Thoughts?
>
> - Leary
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> http://mail.python.org/mailman/listinfo/pypy-dev
>
Looks like it could be a good addition, have you run benchmarks on it
yourself? (Also, should we be directing any new benchmarks to the
python-speed mailing list?)
Alex
--
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20120928/5011b1be/attachment.html>
More information about the pypy-dev
mailing list