Antoine Pitrou wrote:
[snip]
>
> By the way, the Mako benchmark shows a worrying regression (3x slower)
> on your new dict implementation.
Take a look at the timeline graph, is it very noisy?
There is a flaw in the benchmarking code in that the runs are not
interleaved, so other processes tend to introduce systematic errors.
For example, here is a run of mako (comparing my dict with tip):
### mako ###
Min: 0.805583 -> 0.839515: 1.04x slower
Avg: 0.831936 -> 0.910184: 1.09x slower
Significant (t=-3.25)
Stddev: 0.01302 -> 0.11953: 9.1820x larger
It is 9% slower, right?
Wrong. Take a look at the timeline:
http://tinyurl.com/82l9jna
its 1-2% slower, but another process grabs the CPU for some of the
iterations.
This should not be a problem for speed.python.org as it will have a
dedicated machine, but you need to be careful when benchmarking on your
desktop machine.
As an experiment, try benchmarking a python build against itself and see
what you get.
Cheers,
Mark.
>
> But speaking of benchmarks that won't work on other VMs (e.g. twisted under
> Jython), we will obviously try to minimize how many of those we have.
> Twisted is somewhat of a special case because (a) PyPy has already put the
> time into creating the benchmarks and
It's an official twisted benchmark suite, created by JP Calderone
(almost exclusively). We claim no credit
Cheers,
fijal
On Thu, Feb 2, 2012 at 12:06 PM, Brett Cannon <brett(a)python.org> wrote:
>
>
> On Thu, Feb 2, 2012 at 08:42, Maciej Fijalkowski <fijall(a)gmail.com> wrote:
>
>> On Thu, Feb 2, 2012 at 3:40 PM, Stefan Behnel <stefan_ml(a)behnel.de>
>> wrote:
>> > Maciej Fijalkowski, 02.02.2012 14:35:
>> >>> Oh, we have that feature, it's called CPython. The thing is that
>> Cython
>> >>> doesn't get to see the generated sources, so it won't compile them and
>> >>> instead, CPython ends up executing the code at normal interpreted
>> speed. So
>> >>> there's nothing gained by running the benchmark at all. And even if we
>> >>> found a way to hook into this machinery, I doubt that the static
>> compiler
>> >>> overhead would make this any useful. The whole purpose of generating
>> code
>> >>> is that it likely will not look the same the next time you do it
>> (well,
>> >>> outside of benchmarks, that is), so even a cache is unlikely to help
>> much
>> >>> for real code. It's like PyPy running code in interpreting mode
>> before it
>> >>> gets compiled, except that Cython will never compile this code, even
>> if it
>> >>> turns out to be worth it.
>> >>>
>> >>> Personally, I rather consider it a feature that users can employ
>> exec()
>> >>> from their Cython code to run code in plain CPython (for whatever
>> reason).
>> >>
>> >> Yes, ok, but I believe this should mean "Cython does not give speedups
>> >> on this benchmark" and not "we should modify the benchmark".
>> >
>> > Oh, I hadn't suggested to modify it. I was merely stating (as part of a
>> > longer list) that it's of no use specifically to Cython. I.e., if
>> there's
>> > something to gain from having the benchmark runs take less time by
>> > disabling benchmarks for specific runtimes, it's one of the candidates
>> on
>> > our side.
>> >
>> > Stefan
>>
>> Oh ok, I misread you then, sorry.
>>
>> I think having a dedicated speed.python machine is specifically so
>> that we can run benchmarks however much we want :) At least Cython
>> does not compile for like an hour...
>
>
> Yeah, we have tried to make sure the machine we have for all of this is
> fast enough that we can run all the benchmarks on all measured VMs once a
> day. If that ever becomes an issue we would probably prune the benchmarks
> rather than turn them on/off selectively.
>
> But speaking of benchmarks that won't work on other VMs (e.g. twisted
> under Jython), we will obviously try to minimize how many of those we have.
> Twisted is somewhat of a special case because (a) PyPy has already put the
> time into creating the benchmarks and (b) it is used by so many people that
> measuring its speed is a good thing. Otherwise I would argue that all
> future benchmarks should be runnable on any VM and not just CPython or any
> other VMs that support C extensions (numpy is the only exception to this
> that I can think of because of its popularity and numpypy will extend its
> reach once the work is complete).
>
> _______________________________________________
> Speed mailing list
> Speed(a)python.org
> http://mail.python.org/mailman/listinfo/speed
>
>
I think, ideally, the benchmarks should all be valid pure-Python. Without
wanted to beat up on Jython, I don't think that twisted doesn't run there
is an argument against including it.
Alex
--
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
On Thu, Feb 2, 2012 at 08:42, Maciej Fijalkowski <fijall(a)gmail.com> wrote:
> On Thu, Feb 2, 2012 at 3:40 PM, Stefan Behnel <stefan_ml(a)behnel.de> wrote:
> > Maciej Fijalkowski, 02.02.2012 14:35:
> >>> Oh, we have that feature, it's called CPython. The thing is that Cython
> >>> doesn't get to see the generated sources, so it won't compile them and
> >>> instead, CPython ends up executing the code at normal interpreted
> speed. So
> >>> there's nothing gained by running the benchmark at all. And even if we
> >>> found a way to hook into this machinery, I doubt that the static
> compiler
> >>> overhead would make this any useful. The whole purpose of generating
> code
> >>> is that it likely will not look the same the next time you do it (well,
> >>> outside of benchmarks, that is), so even a cache is unlikely to help
> much
> >>> for real code. It's like PyPy running code in interpreting mode before
> it
> >>> gets compiled, except that Cython will never compile this code, even
> if it
> >>> turns out to be worth it.
> >>>
> >>> Personally, I rather consider it a feature that users can employ exec()
> >>> from their Cython code to run code in plain CPython (for whatever
> reason).
> >>
> >> Yes, ok, but I believe this should mean "Cython does not give speedups
> >> on this benchmark" and not "we should modify the benchmark".
> >
> > Oh, I hadn't suggested to modify it. I was merely stating (as part of a
> > longer list) that it's of no use specifically to Cython. I.e., if there's
> > something to gain from having the benchmark runs take less time by
> > disabling benchmarks for specific runtimes, it's one of the candidates on
> > our side.
> >
> > Stefan
>
> Oh ok, I misread you then, sorry.
>
> I think having a dedicated speed.python machine is specifically so
> that we can run benchmarks however much we want :) At least Cython
> does not compile for like an hour...
Yeah, we have tried to make sure the machine we have for all of this is
fast enough that we can run all the benchmarks on all measured VMs once a
day. If that ever becomes an issue we would probably prune the benchmarks
rather than turn them on/off selectively.
But speaking of benchmarks that won't work on other VMs (e.g. twisted under
Jython), we will obviously try to minimize how many of those we have.
Twisted is somewhat of a special case because (a) PyPy has already put the
time into creating the benchmarks and (b) it is used by so many people that
measuring its speed is a good thing. Otherwise I would argue that all
future benchmarks should be runnable on any VM and not just CPython or any
other VMs that support C extensions (numpy is the only exception to this
that I can think of because of its popularity and numpypy will extend its
reach once the work is complete).
On Thu, Feb 2, 2012 at 04:11, Maciej Fijalkowski <fijall(a)gmail.com> wrote:
> On Wed, Feb 1, 2012 at 10:33 PM, Mark Shannon <mark(a)hotpy.org> wrote:
> > Brett Cannon wrote:
> >>
> >>
> >>
> > [snip]
> >>
> >>
> >> So, to prevent this from either ending up in a dead-end because of this,
> >> we need to first decide where the canonical set of Python VM benchmarks
> are
> >> going to live. I say hg.python.org/benchmarks
> >> <http://hg.python.org/benchmarks> for two reasons. One is that Antoine
> has
> >> already done work there to port some of the benchmarks so there is at
> least
> >> some there that are ready to be run under Python 3 (and the tooling is
> in
> >> place to create separate Python 2 and Python 3 benchmark suites). Two,
> this
> >> can be a test of having the various VM contributors work out of
> >> hg.python.org <http://hg.python.org> if we are ever going to break the
> >> stdlib out for shared development. At worst we can simply take the
> changes
> >> made at pypy/benchmarks that apply to just the unladen benchmarks that
> >> exists, and at best merge the two sets (manually) into one benchmark
> suite
> >> so PyPy doesn't lose anything for Python 2 measurements that they have
> >> written and CPython doesn't lose any of its Python 3 benchmarks that it
> has
> >> created.
> >>
> >> How does that sound?
> >>
> > Very sensible.
>
> +1 from me as well. Note that "we'll have a common set of benchmarks
> at python.org" sounds way more pleasant than "use a subrepo from
> python.org".
Great! Assuming no one runs with this and starts integration, we can
discuss it at PyCon and get a plan on how best to handle the merge.
Maciej Fijalkowski, 02.02.2012 14:35:
>> Oh, we have that feature, it's called CPython. The thing is that Cython
>> doesn't get to see the generated sources, so it won't compile them and
>> instead, CPython ends up executing the code at normal interpreted speed. So
>> there's nothing gained by running the benchmark at all. And even if we
>> found a way to hook into this machinery, I doubt that the static compiler
>> overhead would make this any useful. The whole purpose of generating code
>> is that it likely will not look the same the next time you do it (well,
>> outside of benchmarks, that is), so even a cache is unlikely to help much
>> for real code. It's like PyPy running code in interpreting mode before it
>> gets compiled, except that Cython will never compile this code, even if it
>> turns out to be worth it.
>>
>> Personally, I rather consider it a feature that users can employ exec()
>> from their Cython code to run code in plain CPython (for whatever reason).
>
> Yes, ok, but I believe this should mean "Cython does not give speedups
> on this benchmark" and not "we should modify the benchmark".
Oh, I hadn't suggested to modify it. I was merely stating (as part of a
longer list) that it's of no use specifically to Cython. I.e., if there's
something to gain from having the benchmark runs take less time by
disabling benchmarks for specific runtimes, it's one of the candidates on
our side.
Stefan
Maciej Fijalkowski, 02.02.2012 12:12:
> On Thu, Feb 2, 2012 at 1:09 PM, Maciej Fijalkowski wrote:
>> On Thu, Feb 2, 2012 at 10:21 AM, Stefan Behnel wrote:
>>> We would like to join in on speed.python.org, once it's clear how the
>>> benchmarks will be run and how the data uploads work and all that. It
>>> already proved a bit tricky to get Cython integrated with the benchmark
>>> runner on our side, and I'm planning to rewrite that integration at some
>>> point, but it should already be doable to get "something" to work now.
>>
>> Can you come up with a script that does "cython <a python program>"?
>> that would simplify a lot
Yes, I have something like that, but it's a whole bunch of "do this, add
that, then run something". It mostly works (as you can see from the link
above), but it needs some serious reworking.
Basically, it compiles and starts the main program, and then enables
on-the-fly compilation of modules in sitecustomize.py by registering an
import hook. I'll see if I can get the script wrapped up a tiny bit so that
it becomes usable for speed.python.org.
Any way I could get an account on the machine? Would make it easier to test
it there.
>>> I should also note that we don't currently support the whole benchmark
>>> suite, so there must be a way to record individual benchmark results even
>>> in the face of failures in other benchmarks. Basically, speed.python.org
>>> would be useless for us if a failure in a single benchmark left us without
>>> any performance data at all, because it will still take us some time to get
>>> to 100% compliance and we would like to know if anything on that road has a
>>> performance impact. Currently, we apply a short patch that adds a
>>> try-except to the benchmark runner's main loop before starting the
>>> measurements, because otherwise it would just bail out completely on a
>>> single failure. Oh, and we also patch the benchmarks to remove references
>>> to __file__ because of CPython issue 13429, although we may be able to work
>>> around that at some point, specifically when doing on-the-fly compilation
>>> during imports.
>>
>> I think it's fine to mark certain benchmarks not to be runnable under
>> certain platforms. For example it's not like jython will run twisted
>> stuff.
... oh, and we'd like to know when it suddenly starts working. ;)
So, I think catching and ignoring (or logging) errors is the best way to go
about it.
>>> Another test that is useless for us is the "mako" benchmark, because most
>>> of what it does is to run generated code. There is currently no way for
>>> Cython to hook into that, so we're out of the game here.
>>
>> Well, if you want cython to be considered python I think this is a
>> pretty crucial feature no?
Oh, we have that feature, it's called CPython. The thing is that Cython
doesn't get to see the generated sources, so it won't compile them and
instead, CPython ends up executing the code at normal interpreted speed. So
there's nothing gained by running the benchmark at all. And even if we
found a way to hook into this machinery, I doubt that the static compiler
overhead would make this any useful. The whole purpose of generating code
is that it likely will not look the same the next time you do it (well,
outside of benchmarks, that is), so even a cache is unlikely to help much
for real code. It's like PyPy running code in interpreting mode before it
gets compiled, except that Cython will never compile this code, even if it
turns out to be worth it.
Personally, I rather consider it a feature that users can employ exec()
from their Cython code to run code in plain CPython (for whatever reason).
Stefan
On Thu, Feb 2, 2012 at 1:09 PM, Maciej Fijalkowski <fijall(a)gmail.com> wrote:
> On Thu, Feb 2, 2012 at 10:21 AM, Stefan Behnel <stefan_ml(a)behnel.de> wrote:
>> Brett Cannon, 01.02.2012 18:25:
>>> to prevent this from either ending up in a dead-end because of this, we
>>> need to first decide where the canonical set of Python VM benchmarks are
>>> going to live. I say hg.python.org/benchmarks for two reasons. One is that
>>> Antoine has already done work there to port some of the benchmarks so there
>>> is at least some there that are ready to be run under Python 3 (and the
>>> tooling is in place to create separate Python 2 and Python 3 benchmark
>>> suites). Two, this can be a test of having the various VM contributors work
>>> out of hg.python.org if we are ever going to break the stdlib out for
>>> shared development. At worst we can simply take the changes made at
>>> pypy/benchmarks that apply to just the unladen benchmarks that exists, and
>>> at best merge the two sets (manually) into one benchmark suite so PyPy
>>> doesn't lose anything for Python 2 measurements that they have written and
>>> CPython doesn't lose any of its Python 3 benchmarks that it has created.
>>>
>>> How does that sound?
>>
>> +1
>>
>> FWIW, Cython currently uses both benchmark suites, that of PyPy (in Py2.7)
>> and that of hg.python.org (in Py2.7 and 3.3), but without codespeed
>> integration and also without a dedicated server for benchmark runs. So the
>> results are unfortunately not accurate enough to spot minor changes even
>> over time.
>>
>> https://sage.math.washington.edu:8091/hudson/view/bench/
>>
>> We would like to join in on speed.python.org, once it's clear how the
>> benchmarks will be run and how the data uploads work and all that. It
>> already proved a bit tricky to get Cython integrated with the benchmark
>> runner on our side, and I'm planning to rewrite that integration at some
>> point, but it should already be doable to get "something" to work now.
>
> Can you come up with a script that does "cython <a python program>"?
> that would simplify a lot
>
>>
>> I should also note that we don't currently support the whole benchmark
>> suite, so there must be a way to record individual benchmark results even
>> in the face of failures in other benchmarks. Basically, speed.python.org
>> would be useless for us if a failure in a single benchmark left us without
>> any performance data at all, because it will still take us some time to get
>> to 100% compliance and we would like to know if anything on that road has a
>> performance impact. Currently, we apply a short patch that adds a
>> try-except to the benchmark runner's main loop before starting the
>> measurements, because otherwise it would just bail out completely on a
>> single failure. Oh, and we also patch the benchmarks to remove references
>> to __file__ because of CPython issue 13429, although we may be able to work
>> around that at some point, specifically when doing on-the-fly compilation
>> during imports.
>
> I think it's fine to mark certain benchmarks not to be runnable under
> certain platforms. For example it's not like jython will run twisted
> stuff.
>
>>
>> http://bugs.python.org/issue13429
>>
>> Also note that benchmarks that only test C implemented stdlib modules (re,
>> pickle, json) are useless for Cython because they would only end up timing
>> the exact same code as for plain CPython.
>>
>> Another test that is useless for us is the "mako" benchmark, because most
>> of what it does is to run generated code. There is currently no way for
>> Cython to hook into that, so we're out of the game here.
>
> Well, if you want cython to be considered python I think this is a
> pretty crucial feature no?
>
>>
>> We also don't care about program startup tests, obviously, because we know
>> that Cython's compiler overhead plus an optimising gcc run will render them
>> meaningless anyway. I like the fact that there's still an old hg_startup
>> timing result lingering around from the time before I disabled that test,
>> telling us that Cython runs it 99.68% slower than CPython. Got to beat
>> that. 8-)
>
> That's probably okish.
Stefan, can you please not cross-post between mailing lists? Not
everyone is subscribed and people reading would get a confusing
half-of-the-world view.
Cheers,
fijal
Brett Cannon, 01.02.2012 18:25:
> to prevent this from either ending up in a dead-end because of this, we
> need to first decide where the canonical set of Python VM benchmarks are
> going to live. I say hg.python.org/benchmarks for two reasons. One is that
> Antoine has already done work there to port some of the benchmarks so there
> is at least some there that are ready to be run under Python 3 (and the
> tooling is in place to create separate Python 2 and Python 3 benchmark
> suites). Two, this can be a test of having the various VM contributors work
> out of hg.python.org if we are ever going to break the stdlib out for
> shared development. At worst we can simply take the changes made at
> pypy/benchmarks that apply to just the unladen benchmarks that exists, and
> at best merge the two sets (manually) into one benchmark suite so PyPy
> doesn't lose anything for Python 2 measurements that they have written and
> CPython doesn't lose any of its Python 3 benchmarks that it has created.
>
> How does that sound?
+1
FWIW, Cython currently uses both benchmark suites, that of PyPy (in Py2.7)
and that of hg.python.org (in Py2.7 and 3.3), but without codespeed
integration and also without a dedicated server for benchmark runs. So the
results are unfortunately not accurate enough to spot minor changes even
over time.
https://sage.math.washington.edu:8091/hudson/view/bench/
We would like to join in on speed.python.org, once it's clear how the
benchmarks will be run and how the data uploads work and all that. It
already proved a bit tricky to get Cython integrated with the benchmark
runner on our side, and I'm planning to rewrite that integration at some
point, but it should already be doable to get "something" to work now.
I should also note that we don't currently support the whole benchmark
suite, so there must be a way to record individual benchmark results even
in the face of failures in other benchmarks. Basically, speed.python.org
would be useless for us if a failure in a single benchmark left us without
any performance data at all, because it will still take us some time to get
to 100% compliance and we would like to know if anything on that road has a
performance impact. Currently, we apply a short patch that adds a
try-except to the benchmark runner's main loop before starting the
measurements, because otherwise it would just bail out completely on a
single failure. Oh, and we also patch the benchmarks to remove references
to __file__ because of CPython issue 13429, although we may be able to work
around that at some point, specifically when doing on-the-fly compilation
during imports.
http://bugs.python.org/issue13429
Also note that benchmarks that only test C implemented stdlib modules (re,
pickle, json) are useless for Cython because they would only end up timing
the exact same code as for plain CPython.
Another test that is useless for us is the "mako" benchmark, because most
of what it does is to run generated code. There is currently no way for
Cython to hook into that, so we're out of the game here.
We also don't care about program startup tests, obviously, because we know
that Cython's compiler overhead plus an optimising gcc run will render them
meaningless anyway. I like the fact that there's still an old hg_startup
timing result lingering around from the time before I disabled that test,
telling us that Cython runs it 99.68% slower than CPython. Got to beat
that. 8-)
Stefan
Brett Cannon wrote:
>
>
[snip]
>
> So, to prevent this from either ending up in a dead-end because of this,
> we need to first decide where the canonical set of Python VM benchmarks
> are going to live. I say hg.python.org/benchmarks
> <http://hg.python.org/benchmarks> for two reasons. One is that Antoine
> has already done work there to port some of the benchmarks so there is
> at least some there that are ready to be run under Python 3 (and the
> tooling is in place to create separate Python 2 and Python 3 benchmark
> suites). Two, this can be a test of having the various VM contributors
> work out of hg.python.org <http://hg.python.org> if we are ever going to
> break the stdlib out for shared development. At worst we can simply take
> the changes made at pypy/benchmarks that apply to just the unladen
> benchmarks that exists, and at best merge the two sets (manually) into
> one benchmark suite so PyPy doesn't lose anything for Python 2
> measurements that they have written and CPython doesn't lose any of its
> Python 3 benchmarks that it has created.
>
> How does that sound?
>
Very sensible.
Cheers,
Mark.