Mailman 3 February 2012 - Speed

Re: [Speed] [Core-mentorship] The Grand Unified Python Benchmark Suite
by Mark Shannon 12 Feb '12

12 Feb '12

Antoine Pitrou wrote: [snip] > > By the way, the Mako benchmark shows a worrying regression (3x slower) > on your new dict implementation. Take a look at the timeline graph, is it very noisy? There is a flaw in the benchmarking code in that the runs are not interleaved, so other processes tend to introduce systematic errors. For example, here is a run of mako (comparing my dict with tip): ### mako ### Min: 0.805583 -> 0.839515: 1.04x slower Avg: 0.831936 -> 0.910184: 1.09x slower Significant (t=-3.25) Stddev: 0.01302 -> 0.11953: 9.1820x larger It is 9% slower, right? Wrong. Take a look at the timeline: http://tinyurl.com/82l9jna its 1-2% slower, but another process grabs the CPU for some of the iterations. This should not be a problem for speed.python.org as it will have a dedicated machine, but you need to be careful when benchmarking on your desktop machine. As an experiment, try benchmarking a python build against itself and see what you get. Cheers, Mark.

1 0

Re: [Speed] Cython's view on a common benchmark suite
by Maciej Fijalkowski 02 Feb '12

02 Feb '12

> > But speaking of benchmarks that won't work on other VMs (e.g. twisted under > Jython), we will obviously try to minimize how many of those we have. > Twisted is somewhat of a special case because (a) PyPy has already put the > time into creating the benchmarks and It's an official twisted benchmark suite, created by JP Calderone (almost exclusively). We claim no credit Cheers, fijal

1 0

Re: [Speed] Cython's view on a common benchmark suite
by Alex Gaynor 02 Feb '12

02 Feb '12

On Thu, Feb 2, 2012 at 12:06 PM, Brett Cannon <brett(a)python.org> wrote: > > > On Thu, Feb 2, 2012 at 08:42, Maciej Fijalkowski <fijall(a)gmail.com> wrote: > >> On Thu, Feb 2, 2012 at 3:40 PM, Stefan Behnel <stefan_ml(a)behnel.de> >> wrote: >> > Maciej Fijalkowski, 02.02.2012 14:35: >> >>> Oh, we have that feature, it's called CPython. The thing is that >> Cython >> >>> doesn't get to see the generated sources, so it won't compile them and >> >>> instead, CPython ends up executing the code at normal interpreted >> speed. So >> >>> there's nothing gained by running the benchmark at all. And even if we >> >>> found a way to hook into this machinery, I doubt that the static >> compiler >> >>> overhead would make this any useful. The whole purpose of generating >> code >> >>> is that it likely will not look the same the next time you do it >> (well, >> >>> outside of benchmarks, that is), so even a cache is unlikely to help >> much >> >>> for real code. It's like PyPy running code in interpreting mode >> before it >> >>> gets compiled, except that Cython will never compile this code, even >> if it >> >>> turns out to be worth it. >> >>> >> >>> Personally, I rather consider it a feature that users can employ >> exec() >> >>> from their Cython code to run code in plain CPython (for whatever >> reason). >> >> >> >> Yes, ok, but I believe this should mean "Cython does not give speedups >> >> on this benchmark" and not "we should modify the benchmark". >> > >> > Oh, I hadn't suggested to modify it. I was merely stating (as part of a >> > longer list) that it's of no use specifically to Cython. I.e., if >> there's >> > something to gain from having the benchmark runs take less time by >> > disabling benchmarks for specific runtimes, it's one of the candidates >> on >> > our side. >> > >> > Stefan >> >> Oh ok, I misread you then, sorry. >> >> I think having a dedicated speed.python machine is specifically so >> that we can run benchmarks however much we want :) At least Cython >> does not compile for like an hour... > > > Yeah, we have tried to make sure the machine we have for all of this is > fast enough that we can run all the benchmarks on all measured VMs once a > day. If that ever becomes an issue we would probably prune the benchmarks > rather than turn them on/off selectively. > > But speaking of benchmarks that won't work on other VMs (e.g. twisted > under Jython), we will obviously try to minimize how many of those we have. > Twisted is somewhat of a special case because (a) PyPy has already put the > time into creating the benchmarks and (b) it is used by so many people that > measuring its speed is a good thing. Otherwise I would argue that all > future benchmarks should be runnable on any VM and not just CPython or any > other VMs that support C extensions (numpy is the only exception to this > that I can think of because of its popularity and numpypy will extend its > reach once the work is complete). > > _______________________________________________ > Speed mailing list > Speed(a)python.org > http://mail.python.org/mailman/listinfo/speed > > I think, ideally, the benchmarks should all be valid pure-Python. Without wanted to beat up on Jython, I don't think that twisted doesn't run there is an argument against including it. Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero

1 0

Re: [Speed] Cython's view on a common benchmark suite
by Brett Cannon 02 Feb '12

02 Feb '12

On Thu, Feb 2, 2012 at 08:42, Maciej Fijalkowski <fijall(a)gmail.com> wrote: > On Thu, Feb 2, 2012 at 3:40 PM, Stefan Behnel <stefan_ml(a)behnel.de> wrote: > > Maciej Fijalkowski, 02.02.2012 14:35: > >>> Oh, we have that feature, it's called CPython. The thing is that Cython > >>> doesn't get to see the generated sources, so it won't compile them and > >>> instead, CPython ends up executing the code at normal interpreted > speed. So > >>> there's nothing gained by running the benchmark at all. And even if we > >>> found a way to hook into this machinery, I doubt that the static > compiler > >>> overhead would make this any useful. The whole purpose of generating > code > >>> is that it likely will not look the same the next time you do it (well, > >>> outside of benchmarks, that is), so even a cache is unlikely to help > much > >>> for real code. It's like PyPy running code in interpreting mode before > it > >>> gets compiled, except that Cython will never compile this code, even > if it > >>> turns out to be worth it. > >>> > >>> Personally, I rather consider it a feature that users can employ exec() > >>> from their Cython code to run code in plain CPython (for whatever > reason). > >> > >> Yes, ok, but I believe this should mean "Cython does not give speedups > >> on this benchmark" and not "we should modify the benchmark". > > > > Oh, I hadn't suggested to modify it. I was merely stating (as part of a > > longer list) that it's of no use specifically to Cython. I.e., if there's > > something to gain from having the benchmark runs take less time by > > disabling benchmarks for specific runtimes, it's one of the candidates on > > our side. > > > > Stefan > > Oh ok, I misread you then, sorry. > > I think having a dedicated speed.python machine is specifically so > that we can run benchmarks however much we want :) At least Cython > does not compile for like an hour... Yeah, we have tried to make sure the machine we have for all of this is fast enough that we can run all the benchmarks on all measured VMs once a day. If that ever becomes an issue we would probably prune the benchmarks rather than turn them on/off selectively. But speaking of benchmarks that won't work on other VMs (e.g. twisted under Jython), we will obviously try to minimize how many of those we have. Twisted is somewhat of a special case because (a) PyPy has already put the time into creating the benchmarks and (b) it is used by so many people that measuring its speed is a good thing. Otherwise I would argue that all future benchmarks should be runnable on any VM and not just CPython or any other VMs that support C extensions (numpy is the only exception to this that I can think of because of its popularity and numpypy will extend its reach once the work is complete).

1 0

Re: [Speed] Buildbot Status
by Brett Cannon 02 Feb '12

02 Feb '12

On Thu, Feb 2, 2012 at 04:11, Maciej Fijalkowski <fijall(a)gmail.com> wrote: > On Wed, Feb 1, 2012 at 10:33 PM, Mark Shannon <mark(a)hotpy.org> wrote: > > Brett Cannon wrote: > >> > >> > >> > > [snip] > >> > >> > >> So, to prevent this from either ending up in a dead-end because of this, > >> we need to first decide where the canonical set of Python VM benchmarks > are > >> going to live. I say hg.python.org/benchmarks > >> <http://hg.python.org/benchmarks> for two reasons. One is that Antoine > has > >> already done work there to port some of the benchmarks so there is at > least > >> some there that are ready to be run under Python 3 (and the tooling is > in > >> place to create separate Python 2 and Python 3 benchmark suites). Two, > this > >> can be a test of having the various VM contributors work out of > >> hg.python.org <http://hg.python.org> if we are ever going to break the > >> stdlib out for shared development. At worst we can simply take the > changes > >> made at pypy/benchmarks that apply to just the unladen benchmarks that > >> exists, and at best merge the two sets (manually) into one benchmark > suite > >> so PyPy doesn't lose anything for Python 2 measurements that they have > >> written and CPython doesn't lose any of its Python 3 benchmarks that it > has > >> created. > >> > >> How does that sound? > >> > > Very sensible. > > +1 from me as well. Note that "we'll have a common set of benchmarks > at python.org" sounds way more pleasant than "use a subrepo from > python.org". Great! Assuming no one runs with this and starts integration, we can discuss it at PyCon and get a plan on how best to handle the merge.

1 0

Re: [Speed] Cython's view on a common benchmark suite
by Stefan Behnel 02 Feb '12

02 Feb '12

Maciej Fijalkowski, 02.02.2012 14:35: >> Oh, we have that feature, it's called CPython. The thing is that Cython >> doesn't get to see the generated sources, so it won't compile them and >> instead, CPython ends up executing the code at normal interpreted speed. So >> there's nothing gained by running the benchmark at all. And even if we >> found a way to hook into this machinery, I doubt that the static compiler >> overhead would make this any useful. The whole purpose of generating code >> is that it likely will not look the same the next time you do it (well, >> outside of benchmarks, that is), so even a cache is unlikely to help much >> for real code. It's like PyPy running code in interpreting mode before it >> gets compiled, except that Cython will never compile this code, even if it >> turns out to be worth it. >> >> Personally, I rather consider it a feature that users can employ exec() >> from their Cython code to run code in plain CPython (for whatever reason). > > Yes, ok, but I believe this should mean "Cython does not give speedups > on this benchmark" and not "we should modify the benchmark". Oh, I hadn't suggested to modify it. I was merely stating (as part of a longer list) that it's of no use specifically to Cython. I.e., if there's something to gain from having the benchmark runs take less time by disabling benchmarks for specific runtimes, it's one of the candidates on our side. Stefan

2 1

Re: [Speed] Cython's view on a common benchmark suite
by Stefan Behnel 02 Feb '12

02 Feb '12

Maciej Fijalkowski, 02.02.2012 12:12: > On Thu, Feb 2, 2012 at 1:09 PM, Maciej Fijalkowski wrote: >> On Thu, Feb 2, 2012 at 10:21 AM, Stefan Behnel wrote: >>> We would like to join in on speed.python.org, once it's clear how the >>> benchmarks will be run and how the data uploads work and all that. It >>> already proved a bit tricky to get Cython integrated with the benchmark >>> runner on our side, and I'm planning to rewrite that integration at some >>> point, but it should already be doable to get "something" to work now. >> >> Can you come up with a script that does "cython <a python program>"? >> that would simplify a lot Yes, I have something like that, but it's a whole bunch of "do this, add that, then run something". It mostly works (as you can see from the link above), but it needs some serious reworking. Basically, it compiles and starts the main program, and then enables on-the-fly compilation of modules in sitecustomize.py by registering an import hook. I'll see if I can get the script wrapped up a tiny bit so that it becomes usable for speed.python.org. Any way I could get an account on the machine? Would make it easier to test it there. >>> I should also note that we don't currently support the whole benchmark >>> suite, so there must be a way to record individual benchmark results even >>> in the face of failures in other benchmarks. Basically, speed.python.org >>> would be useless for us if a failure in a single benchmark left us without >>> any performance data at all, because it will still take us some time to get >>> to 100% compliance and we would like to know if anything on that road has a >>> performance impact. Currently, we apply a short patch that adds a >>> try-except to the benchmark runner's main loop before starting the >>> measurements, because otherwise it would just bail out completely on a >>> single failure. Oh, and we also patch the benchmarks to remove references >>> to __file__ because of CPython issue 13429, although we may be able to work >>> around that at some point, specifically when doing on-the-fly compilation >>> during imports. >> >> I think it's fine to mark certain benchmarks not to be runnable under >> certain platforms. For example it's not like jython will run twisted >> stuff. ... oh, and we'd like to know when it suddenly starts working. ;) So, I think catching and ignoring (or logging) errors is the best way to go about it. >>> Another test that is useless for us is the "mako" benchmark, because most >>> of what it does is to run generated code. There is currently no way for >>> Cython to hook into that, so we're out of the game here. >> >> Well, if you want cython to be considered python I think this is a >> pretty crucial feature no? Oh, we have that feature, it's called CPython. The thing is that Cython doesn't get to see the generated sources, so it won't compile them and instead, CPython ends up executing the code at normal interpreted speed. So there's nothing gained by running the benchmark at all. And even if we found a way to hook into this machinery, I doubt that the static compiler overhead would make this any useful. The whole purpose of generating code is that it likely will not look the same the next time you do it (well, outside of benchmarks, that is), so even a cache is unlikely to help much for real code. It's like PyPy running code in interpreting mode before it gets compiled, except that Cython will never compile this code, even if it turns out to be worth it. Personally, I rather consider it a feature that users can employ exec() from their Cython code to run code in plain CPython (for whatever reason). Stefan

2 1

Re: [Speed] Cython's view on a common benchmark suite (was: Re: Buildbot Status)
by Maciej Fijalkowski 02 Feb '12

02 Feb '12

On Thu, Feb 2, 2012 at 1:09 PM, Maciej Fijalkowski <fijall(a)gmail.com> wrote: > On Thu, Feb 2, 2012 at 10:21 AM, Stefan Behnel <stefan_ml(a)behnel.de> wrote: >> Brett Cannon, 01.02.2012 18:25: >>> to prevent this from either ending up in a dead-end because of this, we >>> need to first decide where the canonical set of Python VM benchmarks are >>> going to live. I say hg.python.org/benchmarks for two reasons. One is that >>> Antoine has already done work there to port some of the benchmarks so there >>> is at least some there that are ready to be run under Python 3 (and the >>> tooling is in place to create separate Python 2 and Python 3 benchmark >>> suites). Two, this can be a test of having the various VM contributors work >>> out of hg.python.org if we are ever going to break the stdlib out for >>> shared development. At worst we can simply take the changes made at >>> pypy/benchmarks that apply to just the unladen benchmarks that exists, and >>> at best merge the two sets (manually) into one benchmark suite so PyPy >>> doesn't lose anything for Python 2 measurements that they have written and >>> CPython doesn't lose any of its Python 3 benchmarks that it has created. >>> >>> How does that sound? >> >> +1 >> >> FWIW, Cython currently uses both benchmark suites, that of PyPy (in Py2.7) >> and that of hg.python.org (in Py2.7 and 3.3), but without codespeed >> integration and also without a dedicated server for benchmark runs. So the >> results are unfortunately not accurate enough to spot minor changes even >> over time. >> >> https://sage.math.washington.edu:8091/hudson/view/bench/ >> >> We would like to join in on speed.python.org, once it's clear how the >> benchmarks will be run and how the data uploads work and all that. It >> already proved a bit tricky to get Cython integrated with the benchmark >> runner on our side, and I'm planning to rewrite that integration at some >> point, but it should already be doable to get "something" to work now. > > Can you come up with a script that does "cython <a python program>"? > that would simplify a lot > >> >> I should also note that we don't currently support the whole benchmark >> suite, so there must be a way to record individual benchmark results even >> in the face of failures in other benchmarks. Basically, speed.python.org >> would be useless for us if a failure in a single benchmark left us without >> any performance data at all, because it will still take us some time to get >> to 100% compliance and we would like to know if anything on that road has a >> performance impact. Currently, we apply a short patch that adds a >> try-except to the benchmark runner's main loop before starting the >> measurements, because otherwise it would just bail out completely on a >> single failure. Oh, and we also patch the benchmarks to remove references >> to __file__ because of CPython issue 13429, although we may be able to work >> around that at some point, specifically when doing on-the-fly compilation >> during imports. > > I think it's fine to mark certain benchmarks not to be runnable under > certain platforms. For example it's not like jython will run twisted > stuff. > >> >> http://bugs.python.org/issue13429 >> >> Also note that benchmarks that only test C implemented stdlib modules (re, >> pickle, json) are useless for Cython because they would only end up timing >> the exact same code as for plain CPython. >> >> Another test that is useless for us is the "mako" benchmark, because most >> of what it does is to run generated code. There is currently no way for >> Cython to hook into that, so we're out of the game here. > > Well, if you want cython to be considered python I think this is a > pretty crucial feature no? > >> >> We also don't care about program startup tests, obviously, because we know >> that Cython's compiler overhead plus an optimising gcc run will render them >> meaningless anyway. I like the fact that there's still an old hg_startup >> timing result lingering around from the time before I disabled that test, >> telling us that Cython runs it 99.68% slower than CPython. Got to beat >> that. 8-) > > That's probably okish. Stefan, can you please not cross-post between mailing lists? Not everyone is subscribed and people reading would get a confusing half-of-the-world view. Cheers, fijal

1 0

Cython's view on a common benchmark suite (was: Re: Buildbot Status)
by Stefan Behnel 02 Feb '12

02 Feb '12

Brett Cannon, 01.02.2012 18:25: > to prevent this from either ending up in a dead-end because of this, we > need to first decide where the canonical set of Python VM benchmarks are > going to live. I say hg.python.org/benchmarks for two reasons. One is that > Antoine has already done work there to port some of the benchmarks so there > is at least some there that are ready to be run under Python 3 (and the > tooling is in place to create separate Python 2 and Python 3 benchmark > suites). Two, this can be a test of having the various VM contributors work > out of hg.python.org if we are ever going to break the stdlib out for > shared development. At worst we can simply take the changes made at > pypy/benchmarks that apply to just the unladen benchmarks that exists, and > at best merge the two sets (manually) into one benchmark suite so PyPy > doesn't lose anything for Python 2 measurements that they have written and > CPython doesn't lose any of its Python 3 benchmarks that it has created. > > How does that sound? +1 FWIW, Cython currently uses both benchmark suites, that of PyPy (in Py2.7) and that of hg.python.org (in Py2.7 and 3.3), but without codespeed integration and also without a dedicated server for benchmark runs. So the results are unfortunately not accurate enough to spot minor changes even over time. https://sage.math.washington.edu:8091/hudson/view/bench/ We would like to join in on speed.python.org, once it's clear how the benchmarks will be run and how the data uploads work and all that. It already proved a bit tricky to get Cython integrated with the benchmark runner on our side, and I'm planning to rewrite that integration at some point, but it should already be doable to get "something" to work now. I should also note that we don't currently support the whole benchmark suite, so there must be a way to record individual benchmark results even in the face of failures in other benchmarks. Basically, speed.python.org would be useless for us if a failure in a single benchmark left us without any performance data at all, because it will still take us some time to get to 100% compliance and we would like to know if anything on that road has a performance impact. Currently, we apply a short patch that adds a try-except to the benchmark runner's main loop before starting the measurements, because otherwise it would just bail out completely on a single failure. Oh, and we also patch the benchmarks to remove references to __file__ because of CPython issue 13429, although we may be able to work around that at some point, specifically when doing on-the-fly compilation during imports. http://bugs.python.org/issue13429 Also note that benchmarks that only test C implemented stdlib modules (re, pickle, json) are useless for Cython because they would only end up timing the exact same code as for plain CPython. Another test that is useless for us is the "mako" benchmark, because most of what it does is to run generated code. There is currently no way for Cython to hook into that, so we're out of the game here. We also don't care about program startup tests, obviously, because we know that Cython's compiler overhead plus an optimising gcc run will render them meaningless anyway. I like the fact that there's still an old hg_startup timing result lingering around from the time before I disabled that test, telling us that Cython runs it 99.68% slower than CPython. Got to beat that. 8-) Stefan

3 2

Re: [Speed] Buildbot Status
by Mark Shannon 02 Feb '12

02 Feb '12

Brett Cannon wrote: > > [snip] > > So, to prevent this from either ending up in a dead-end because of this, > we need to first decide where the canonical set of Python VM benchmarks > are going to live. I say hg.python.org/benchmarks > <http://hg.python.org/benchmarks> for two reasons. One is that Antoine > has already done work there to port some of the benchmarks so there is > at least some there that are ready to be run under Python 3 (and the > tooling is in place to create separate Python 2 and Python 3 benchmark > suites). Two, this can be a test of having the various VM contributors > work out of hg.python.org <http://hg.python.org> if we are ever going to > break the stdlib out for shared development. At worst we can simply take > the changes made at pypy/benchmarks that apply to just the unladen > benchmarks that exists, and at best merge the two sets (manually) into > one benchmark suite so PyPy doesn't lose anything for Python 2 > measurements that they have written and CPython doesn't lose any of its > Python 3 benchmarks that it has created. > > How does that sound? > Very sensible. Cheers, Mark.

2 1