MalGen as a benchmark?
Found a red-hot, branchy-looking Python kernel in the wild and naturally I thought of you trace compiler folks! ;-) Hope that it might be useful: I think it could make a nice addition to the speed center, seeing as how it's a CPU bound workload on all the machines I have access to (though I haven't profiled it at all so it could potentially be leaning heavily on paths in some unoptimized builtins). MalGen is a set of scripts which generate large, distributed data sets suitable for testing and benchmarking software designed to perform parallel processing on large data sets. The data sets can be thought of as site-entity log files. After an initial seeding, the scripts allow for the data generation to be initiated from a single central node to run the generation concurrently on multiple remote nodes of the cluster. -- http://code.google.com/p/malgen/ Specifically, http://code.google.com/p/malgen/source/browse/trunk/bin/cloud/malgen/malgen.... which gets run thusly: :: pypy malgen.py -O /tmp/ -o INITIAL.txt 0 50000000 10000000 21 (Where 5e7 is the "initial block size" and 1e7 is the other-than-inital block size.) This generates the initial seeding they were talking about, followed by a run for each of N blocks on each node (in this hypothetical setup, for 5 blocks on each of four nodes the following is run): :: pypy malgen.py -O /tmp [start_value] The metadata is read out of the INITIAL.txt file and used to determine the size of the block, and the parameter [start_value] is used to bump to the appropriate start id count for the current block. Inner loop: http://code.google.com/p/malgen/source/browse/trunk/bin/cloud/malgen/malgen.... Thoughts? - Leary
On Fri, Sep 28, 2012 at 4:36 PM, Chris Leary <cdleary@acm.org> wrote:
Found a red-hot, branchy-looking Python kernel in the wild and naturally I thought of you trace compiler folks! ;-) Hope that it might be useful: I think it could make a nice addition to the speed center, seeing as how it's a CPU bound workload on all the machines I have access to (though I haven't profiled it at all so it could potentially be leaning heavily on paths in some unoptimized builtins).
MalGen is a set of scripts which generate large, distributed data sets suitable for testing and benchmarking software designed to perform parallel processing on large data sets. The data sets can be thought of as site-entity log files. After an initial seeding, the scripts allow for the data generation to be initiated from a single central node to run the generation concurrently on multiple remote nodes of the cluster.
-- http://code.google.com/p/malgen/
Specifically, http://code.google.com/p/malgen/source/browse/trunk/bin/cloud/malgen/malgen.... which gets run thusly:
::
pypy malgen.py -O /tmp/ -o INITIAL.txt 0 50000000 10000000 21
(Where 5e7 is the "initial block size" and 1e7 is the other-than-inital block size.) This generates the initial seeding they were talking about, followed by a run for each of N blocks on each node (in this hypothetical setup, for 5 blocks on each of four nodes the following is run):
::
pypy malgen.py -O /tmp [start_value]
The metadata is read out of the INITIAL.txt file and used to determine the size of the block, and the parameter [start_value] is used to bump to the appropriate start id count for the current block.
Inner loop: http://code.google.com/p/malgen/source/browse/trunk/bin/cloud/malgen/malgen....
Thoughts?
- Leary _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
Looks like it could be a good addition, have you run benchmarks on it yourself? (Also, should we be directing any new benchmarks to the python-speed mailing list?) Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero
On Fri, Sep 28, 2012 at 4:39 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:
Looks like it could be a good addition, have you run benchmarks on it yourself? (Also, should we be directing any new benchmarks to the python-speed mailing list?)
It's the setup procedure for the MalStone map-reduce benchmark, but often ends up taking four times as long as the benchmark itself for large datasets! Should I cross post to python-speed? The site says to post here: http://speed.pypy.org/about/ -- any additional information you think I should include to cross post there? Thanks. - Leary
On Sat, Sep 29, 2012 at 3:22 AM, Chris Leary <cdleary@acm.org> wrote:
On Fri, Sep 28, 2012 at 4:39 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:
Looks like it could be a good addition, have you run benchmarks on it yourself? (Also, should we be directing any new benchmarks to the python-speed mailing list?)
It's the setup procedure for the MalStone map-reduce benchmark, but often ends up taking four times as long as the benchmark itself for large datasets! Should I cross post to python-speed? The site says to post here: http://speed.pypy.org/about/ -- any additional information you think I should include to cross post there? Thanks.
- Leary
I think if we don't include them, speed.python won't include them for sure. I'll try to deal with it some time today.
On Sat, Sep 29, 2012 at 3:22 AM, Chris Leary <cdleary@acm.org> wrote:
On Fri, Sep 28, 2012 at 4:39 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:
Looks like it could be a good addition, have you run benchmarks on it yourself? (Also, should we be directing any new benchmarks to the python-speed mailing list?)
It's the setup procedure for the MalStone map-reduce benchmark, but often ends up taking four times as long as the benchmark itself for large datasets! Should I cross post to python-speed? The site says to post here: http://speed.pypy.org/about/ -- any additional information you think I should include to cross post there? Thanks.
- Leary
at current svn version it plain doesn't work without seed data
On Sun, Sep 30, 2012 at 6:08 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
at current svn version it plain doesn't work without seed data
Yeah, the first step is that it has to generate seed data. Since it's the same loop I'd hope that's good enough. Python2.7 seems to run ~2 seconds faster on a ~minute length run. $ perf stat pypy malgen.py -O /tmp/ -o INITIAL.txt 0 5000000 1000000 21 0 Entity 0 Events Generated 500000 Events Generated [..snip..] Performance counter stats for 'pypy malgen.py -O /tmp/ -o INITIAL.txt 0 5000000 1000000 21': 65447.670235 task-clock # 0.931 CPUs utilized 6,091 context-switches # 0.000 M/sec 88 CPU-migrations # 0.000 M/sec 39,751 page-faults # 0.001 M/sec 187,721,401,361 cycles # 2.868 GHz [83.34%] 125,966,916,332 stalled-cycles-frontend # 67.10% frontend cycles idle [83.33%] 89,836,165,138 stalled-cycles-backend # 47.86% backend cycles idle [66.63%] 122,596,433,926 instructions # 0.65 insns per cycle # 1.03 stalled cycles per insn [83.30%] 27,158,701,261 branches # 414.968 M/sec [83.35%] 1,309,172,455 branch-misses # 4.82% of all branches [83.35%] 70.276668518 seconds time elapsed $ perf stat python2.7 malgen.py -O /tmp/ -o INITIAL.txt 0 5000000 1000000 21 0 Entity 0 Events Generated 500000 Events Generated [..snip..] Performance counter stats for 'python2.7 malgen.py -O /tmp/ -o INITIAL.txt 0 5000000 1000000 21': 67696.460942 task-clock # 0.991 CPUs utilized 6,192 context-switches # 0.000 M/sec 87 CPU-migrations # 0.000 M/sec 4,168 page-faults # 0.000 M/sec 194,918,427,158 cycles # 2.879 GHz [83.34%] 95,351,613,483 stalled-cycles-frontend # 48.92% frontend cycles idle [83.32%] 53,693,951,677 stalled-cycles-backend # 27.55% backend cycles idle [66.68%] 209,613,931,049 instructions # 1.08 insns per cycle # 0.45 stalled cycles per insn [83.35%] 44,855,636,904 branches # 662.599 M/sec [83.32%] 1,687,165,902 branch-misses # 3.76% of all branches [83.34%] 68.335222479 seconds time elapsed - Leary
On Sun, Sep 30, 2012 at 7:56 PM, Chris Leary <cdleary@acm.org> wrote:
On Sun, Sep 30, 2012 at 6:08 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
at current svn version it plain doesn't work without seed data
Yeah, the first step is that it has to generate seed data. Since it's the same loop I'd hope that's good enough.
how do you do that?
Python2.7 seems to run ~2 seconds faster on a ~minute length run.
$ perf stat pypy malgen.py -O /tmp/ -o INITIAL.txt 0 5000000 1000000 21 0 Entity 0 Events Generated 500000 Events Generated [..snip..]
Performance counter stats for 'pypy malgen.py -O /tmp/ -o INITIAL.txt 0 5000000 1000000 21':
65447.670235 task-clock # 0.931 CPUs utilized 6,091 context-switches # 0.000 M/sec 88 CPU-migrations # 0.000 M/sec 39,751 page-faults # 0.001 M/sec 187,721,401,361 cycles # 2.868 GHz [83.34%] 125,966,916,332 stalled-cycles-frontend # 67.10% frontend cycles idle [83.33%] 89,836,165,138 stalled-cycles-backend # 47.86% backend cycles idle [66.63%] 122,596,433,926 instructions # 0.65 insns per cycle # 1.03 stalled cycles per insn [83.30%] 27,158,701,261 branches # 414.968 M/sec [83.35%] 1,309,172,455 branch-misses # 4.82% of all branches [83.35%]
70.276668518 seconds time elapsed
$ perf stat python2.7 malgen.py -O /tmp/ -o INITIAL.txt 0 5000000 1000000 21 0 Entity 0 Events Generated 500000 Events Generated [..snip..]
Performance counter stats for 'python2.7 malgen.py -O /tmp/ -o INITIAL.txt 0 5000000 1000000 21':
67696.460942 task-clock # 0.991 CPUs utilized 6,192 context-switches # 0.000 M/sec 87 CPU-migrations # 0.000 M/sec 4,168 page-faults # 0.000 M/sec 194,918,427,158 cycles # 2.879 GHz [83.34%] 95,351,613,483 stalled-cycles-frontend # 48.92% frontend cycles idle [83.32%] 53,693,951,677 stalled-cycles-backend # 27.55% backend cycles idle [66.68%] 209,613,931,049 instructions # 1.08 insns per cycle # 0.45 stalled cycles per insn [83.35%] 44,855,636,904 branches # 662.599 M/sec [83.32%] 1,687,165,902 branch-misses # 3.76% of all branches [83.34%]
68.335222479 seconds time elapsed
- Leary
participants (3)
-
Alex Gaynor
-
Chris Leary
-
Maciej Fijalkowski