Really cruel draft of vbench setup for NumPy (.add.reduce benchmarks since 2011)
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
On Wed, 01 May 2013, Sebastian Berg wrote:
a = np.random.random((1000, 1000))
FWIW -- just as a cruel first attempt look at http://www.onerussian.com/tmp/numpy-vbench-20130506/vb_vb_reduce.html why float16 case is so special? I have pushed this really coarse setup (based on some elderly copy of pandas' vbench) to https://github.com/yarikoptic/numpy-vbench if you care to tune it up/extend and then I could fire it up again on that box (which doesn't do anything else ATM AFAIK). Since majority of time is spent actually building it (did it with ccache though) it would be neat if you come up with more of benchmarks to run which you might think could be interesting/important. -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Mon, 2013-05-06 at 10:32 -0400, Yaroslav Halchenko wrote:
Float16 is special, it is cpu-bound -- not memory bound as most reductions -- because it is not a native type. First thought it was weird, but it actually makes sense, if you have a and b as float16: a + b is actually more like (I believe...): float16(float32(a) + float32(b)) This means there is type casting going on *inside* the ufunc! Normally casting is handled outside the ufunc (by the buffered iterator). Now I did not check, but when the iteration order is not optimized, the ufunc *can* simplify this to something similar to this (along the reduction axis): result = float32(a[0]) for i in xrange(a[1:]): result += float32(a.next()) return float16(result) While for "optimized" iteration order, this cannot happen because the intermediate result is always written back. This means for optimized iteration order a single conversion to float is necessary (in the inner loop), while for unoptimized iteration order two conversions to float and one back is done. Since this conversion is costly, the memory throughput is actually not important (no gain from buffering). This leads to the visible slowdown. This is of course a bit annoying, but not sure how you would solve it (Have the dtype signal that it doesn't even want iteration order optimization? Try to do move those weird float16 conversations from the ufunc to the iterator somehow?).
I think this is pretty cool! Probably would be a while until there are many tests, but if you or someone could set such thing up it could slowly grow when larger code changes are done? Regards, Sebastian
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
On Mon, 06 May 2013, Sebastian Berg wrote:
that is the idea but it would be nice to gather such simple benchmark-tests. if you could hint on the numpy functionality you think especially worth benchmarking (I know -- there is a lot of things which could be set to be benchmarked) -- that would be a nice starting point: just list functionality/functions you consider of primary interest. and either it is worth testing for different types or just a gross estimate (e.g. for the selection of types in a loop) As for myself -- I guess I will add fancy indexing and slicing tests. Adding them is quite easy: have a look at https://github.com/yarikoptic/numpy-vbench/blob/master/vb_reduce.py which is actually a bit more cumbersome because of running them for different types. This one is more obvious: https://github.com/yarikoptic/numpy-vbench/blob/master/vb_io.py -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Mon, 2013-05-06 at 12:11 -0400, Yaroslav Halchenko wrote:
Indexing/assignment was the first thing I thought of too (also because fancy indexing/assignment really could use some speedups...). Other then that maybe some timings for small arrays/scalar math, but that might be nice for that GSoC project. Maybe array creation functions, just to see if performance bugs should sneak into something that central. But can't think of something else that isn't specific functionality. - Sebastian
![](https://secure.gravatar.com/avatar/18a30ce6d84de6ce5c11ce006d10f616.jpg?s=120&d=mm&r=g)
On 7 May 2013 13:47, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Why not going bigger? Ufunc operations on big arrays, CPU and memory bound. Also, what about interfacing with other packages? It may increase the compiling overhead, but I would like to see Cython in action (say, only last version, maybe it can be fixed).
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
Hi Guys, not quite the recommendations you expressed, but here is my ugly attempt to improve benchmarks coverage: http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html initially I also ran those ufunc benchmarks per each dtype separately, but then resulting webpage is loong which brings my laptop on its knees by firefox. So I commented those out for now, and left only "summary" ones across multiple datatypes. There is a bug in sphinx which forbids embedding some figures for vb_random "as is", so pardon that for now... I have not set cpu affinity of the process (but ran it at nice -10), so may be that also contributed to variance of benchmark estimates. And there probably could be more of goodies (e.g. gc control etc) to borrow from https://github.com/pydata/pandas/blob/master/vb_suite/test_perf.py which I have just discovered to minimize variance. nothing really interesting was pin-pointed so far, besides that - svd became a bit faster since few months back ;-) http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html - isnan (and isinf, isfinite) got improved http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-i... - right_shift got a miniscule slowdown from what it used to be? http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-r... As before -- current code of those benchmarks collection is available at http://github.com/yarikoptic/numpy-vbench/pull/new/master if you have specific snippets you would like to benchmark -- just state them here or send a PR -- I will add them in. Cheers, On Tue, 07 May 2013, Daπid wrote:
Why not going bigger? Ufunc operations on big arrays, CPU and memory bound.
-- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
FWIW -- updated plots with contribution from Julian Taylor http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_indexing.html#mmap... ;-) On Mon, 01 Jul 2013, Yaroslav Halchenko wrote:
Hi Guys,
not quite the recommendations you expressed, but here is my ugly attempt to improve benchmarks coverage:
http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html
There is a bug in sphinx which forbids embedding some figures for vb_random "as is", so pardon that for now...
nothing really interesting was pin-pointed so far, besides that
- svd became a bit faster since few months back ;-)
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html
- isnan (and isinf, isfinite) got improved
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-i...
- right_shift got a miniscule slowdown from what it used to be?
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-r...
As before -- current code of those benchmarks collection is available at http://github.com/yarikoptic/numpy-vbench/pull/new/master
if you have specific snippets you would like to benchmark -- just state them here or send a PR -- I will add them in.
Cheers,
On Tue, 07 May 2013, Daπid wrote:
Why not going bigger? Ufunc operations on big arrays, CPU and memory bound.
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
Julian Taylor contributed some benchmarks he was "concerned" about, so now the collection is even better. I will keep updating tests on the same url: http://www.onerussian.com/tmp/numpy-vbench/ [it is now running and later I will upload with more commits for higher temporal fidelity] of particular interest for you might be: some minor consistent recent losses in http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-floa... http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-floa... http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int1... http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int8 seems have lost more than 25% of performance throughout the timeline http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#memcpy-int8 "fast" calls to all/any seemed to be hurt twice in their life time now running *3 times slower* than in 2011 -- inflection points correspond to regressions and/or their fixes in those functions to bring back performance on "slow" cases (when array traversal is needed, e.g. on arrays of zeros for any) http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-all-fast http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-any-fast Enjoy On Mon, 01 Jul 2013, Yaroslav Halchenko wrote:
On Mon, 01 Jul 2013, Yaroslav Halchenko wrote:
Hi Guys,
not quite the recommendations you expressed, but here is my ugly attempt to improve benchmarks coverage:
http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html
There is a bug in sphinx which forbids embedding some figures for vb_random "as is", so pardon that for now...
nothing really interesting was pin-pointed so far, besides that
- svd became a bit faster since few months back ;-)
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html
- isnan (and isinf, isfinite) got improved
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-i...
- right_shift got a miniscule slowdown from what it used to be?
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-r...
As before -- current code of those benchmarks collection is available at http://github.com/yarikoptic/numpy-vbench/pull/new/master
if you have specific snippets you would like to benchmark -- just state them here or send a PR -- I will add them in.
Cheers,
On Tue, 07 May 2013, Daπid wrote:
Why not going bigger? Ufunc operations on big arrays, CPU and memory bound.
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
and to put so far reported findings into some kind of automated form, please welcome http://www.onerussian.com/tmp/numpy-vbench/#benchmarks-performance-analysis This is based on a simple 1-way anova of last 10 commits and some point in the past where 10 other commits had smallest timing and were significantly different from the last 10 commits. "Possible recent" is probably too noisy and not sure if useful -- it should point to a closest in time (to the latest commits) diff where a significant excursion from current performance was detected. So per se it has nothing to do with the initial detected performance hit, but in some cases seems still to reasonably locate commits hitting on performance. Enjoy, On Tue, 09 Jul 2013, Yaroslav Halchenko wrote:
Julian Taylor contributed some benchmarks he was "concerned" about, so now the collection is even better.
seems have lost more than 25% of performance throughout the timeline http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#memcpy-int8
http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-all-fast http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-any-fast
Enjoy
On Mon, 01 Jul 2013, Yaroslav Halchenko wrote:
On Mon, 01 Jul 2013, Yaroslav Halchenko wrote:
Hi Guys,
not quite the recommendations you expressed, but here is my ugly attempt to improve benchmarks coverage:
http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html
There is a bug in sphinx which forbids embedding some figures for vb_random "as is", so pardon that for now...
nothing really interesting was pin-pointed so far, besides that
- svd became a bit faster since few months back ;-)
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html
- isnan (and isinf, isfinite) got improved
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-i...
- right_shift got a miniscule slowdown from what it used to be?
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-r...
As before -- current code of those benchmarks collection is available at http://github.com/yarikoptic/numpy-vbench/pull/new/master
if you have specific snippets you would like to benchmark -- just state them here or send a PR -- I will add them in.
Cheers,
On Tue, 07 May 2013, Daπid wrote:
Why not going bigger? Ufunc operations on big arrays, CPU and memory bound.
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
I have just added a few more benchmarks, and here they come http://www.onerussian.com/tmp/numpy-vbench/vb_vb_linalg.html#numpy-linalg-pi... it seems to be very recent so my only check based on 10 commits didn't pick it up yet so they are not present in the summary table. could well be related to 80% faster det()? ;) norm was hit as well a bit earlier, might well be within these commits: https://github.com/numpy/numpy/compare/24a0aa5...29dcc54 I will rerun now benchmarking for the rest of commits (was running last in the day iirc) Cheers, On Tue, 16 Jul 2013, Yaroslav Halchenko wrote:
and to put so far reported findings into some kind of automated form, please welcome
http://www.onerussian.com/tmp/numpy-vbench/#benchmarks-performance-analysis
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
The biggest ~recent change in master's linalg was the switch to gufunc back ends - you might want to check for that event in your commit log. On 19 Jul 2013 23:08, "Yaroslav Halchenko" <lists@onerussian.com> wrote:
![](https://secure.gravatar.com/avatar/da3a0a1942fbdc5ee9a9b8115ac5dae7.jpg?s=120&d=mm&r=g)
20.07.2013 01:38, Nathaniel Smith kirjoitti:
The biggest ~recent change in master's linalg was the switch to gufunc back ends - you might want to check for that event in your commit log.
That was in mid-April, which doesn't match with the location of the uptick in the graph. Pauli
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
At some point I hope to tune up the report with an option of viewing the plot using e.g. nvd3 JS so it could be easier to pin point/analyze interactively. On Sat, 20 Jul 2013, Pauli Virtanen wrote:
That was in mid-April, which doesn't match with the location of the uptick in the graph.
-- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
On Mon, 22 Jul 2013, Benjamin Root wrote:
"that's just sick!" do you know about any motion in python-sphinx world on supporting it? is there any demo page you would recommend to assess what to expect supported in upcoming webagg? -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/09939f25b639512a537ce2c90f77f958.jpg?s=120&d=mm&r=g)
On Mon, Jul 22, 2013 at 1:28 PM, Yaroslav Halchenko <lists@onerussian.com>wrote:
Oldie but goodie: http://mdboom.github.io/blog/2012/10/11/matplotlib-in-the-browser-its-coming... Official Announcement: http://matplotlib.org/1.3.0/users/whats_new.html#webagg-backend Note, this is different than what is now available in IPython Notebook (it isn't really interactive there). As for what is supported, just about everything you can do normally, can be done in WebAgg. I have no clue about sphinx-level support. Now, back to your regularly scheduled program. Cheers! Ben Root
![](https://secure.gravatar.com/avatar/d2aafb97833979e3668c61d36e697bfc.jpg?s=120&d=mm&r=g)
On 7/19/13, Yaroslav Halchenko <lists@onerussian.com> wrote:
Well, this is embarrassing: https://github.com/numpy/numpy/pull/3539 Thanks for benchmarks! I'm now an even bigger fan. :) Warren might well be within these commits:
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
On Fri, 19 Jul 2013, Warren Weckesser wrote:
Well, this is embarrassing: https://github.com/numpy/numpy/pull/3539
Thanks for benchmarks! I'm now an even bigger fan. :)
Great to see that those came of help! I thought to provide a detailed details (benchmarking all recent commits) to provide exact point of regression, but embarrassingly I made that run outside of the benchmarking chroot, so consistency was not guaranteed. Anyways -- rerunning it correctly now (with recent commits included). -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
Added some basic constructors benchmarks: http://www.onerussian.com/tmp/numpy-vbench/vb_vb_core.html quite a bit of fresh enhancements are present (cool) but also some freshly discovered elderly hits, e.g. http://www.onerussian.com/tmp/numpy-vbench/vb_vb_core.html#numpy-identity-10... http://www.onerussian.com/tmp/numpy-vbench/vb_vb_core.html#numpy-ones-100 Cheers, On Fri, 19 Jul 2013, Yaroslav Halchenko wrote:
could well be related to 80% faster det()? ;)
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
I am glad to announce that now you can see benchmark timing plots for multiple branches, thus being able to spot regressions in maintenance branches and compare enhancements in relation to previous releases. e.g. * improving upon 1.7.x but still lacking behind 1.6.x http://www.onerussian.com/tmp/numpy-vbench/vb_vb_core.html#numpy-identity-10... http://www.onerussian.com/tmp/numpy-vbench/vb_vb_function_base.html#percenti... ... * what seems to be a regression caught/fixed in 1.7.x: http://www.onerussian.com/tmp/numpy-vbench/vb_vb_indexing.html#a-indexes-flo... * or not (yet) fixed in 1.7.x http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-comp... summary table generation is not yet adjusted for this multi-branch changes, so there might be misleading results there: http://www.onerussian.com/tmp/numpy-vbench/index.html#benchmarks-performance... Cheers, -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
FWIW -- updated runs of the benchmarks are available at http://yarikoptic.github.io/numpy-vbench which now include also maintenance/1.8.x branch (no divergences were detected yet). There are only recent improvements as I see and no new (but some old ones are still there, some might be specific to my CPU here) performance regressions. Cheers, -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
On Fri, 06 Sep 2013, Daπid wrote:
some old ones are still there, some might be specific to my CPU here
How long does one run take? Maybe I can run it in my machine (Intel i5) for comparison.
In current configuration where I "target" benchmark run to around 200ms (thus possibly jumping up to 400ms) and thus 1-2 sec for 3 actual runs to figure out min among those -- on that elderly box it takes about a day to run "end of the day" commits (iirc around 400 of them) and then 3-4 days for a full run (all commits). I am not sure if targetting 200ms is of any benefit, as opposed to 100ms which would run twice faster. you are welcome to give it a shout right away http://github.com/yarikoptic/numpy-vbench it is still a bit ad-hoc and I also use additional shell wrapper to set cpu affinity (taskset -cp 1) and renice to -10 the benchmarking process. -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Fri, Sep 6, 2013 at 3:21 PM, Yaroslav Halchenko <lists@onerussian.com> wrote:
You would have enough data to add some quality control bands to the charts (like cusum chart for example). Then it would be possible to send a congratulation note or ring an alarm bell without looking at all the plots. Josef
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
On Fri, 06 Sep 2013, josef.pktd@gmail.com wrote:
well -- I did cook up some basic "detector" but I believe I haven't adjusted it for multiple branches yet: http://yarikoptic.github.io/numpy-vbench/#benchmarks-performance-analysis you are welcome to introduce additional (or replacement) detection goodness http://github.com/yarikoptic/vbench/blob/HEAD/vbench/analysis.py and plotting is done here I believe: https://github.com/yarikoptic/vbench/blob/HEAD/vbench/benchmark.py#L155 -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Mon, 2013-05-06 at 10:32 -0400, Yaroslav Halchenko wrote:
Float16 is special, it is cpu-bound -- not memory bound as most reductions -- because it is not a native type. First thought it was weird, but it actually makes sense, if you have a and b as float16: a + b is actually more like (I believe...): float16(float32(a) + float32(b)) This means there is type casting going on *inside* the ufunc! Normally casting is handled outside the ufunc (by the buffered iterator). Now I did not check, but when the iteration order is not optimized, the ufunc *can* simplify this to something similar to this (along the reduction axis): result = float32(a[0]) for i in xrange(a[1:]): result += float32(a.next()) return float16(result) While for "optimized" iteration order, this cannot happen because the intermediate result is always written back. This means for optimized iteration order a single conversion to float is necessary (in the inner loop), while for unoptimized iteration order two conversions to float and one back is done. Since this conversion is costly, the memory throughput is actually not important (no gain from buffering). This leads to the visible slowdown. This is of course a bit annoying, but not sure how you would solve it (Have the dtype signal that it doesn't even want iteration order optimization? Try to do move those weird float16 conversations from the ufunc to the iterator somehow?).
I think this is pretty cool! Probably would be a while until there are many tests, but if you or someone could set such thing up it could slowly grow when larger code changes are done? Regards, Sebastian
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
On Mon, 06 May 2013, Sebastian Berg wrote:
that is the idea but it would be nice to gather such simple benchmark-tests. if you could hint on the numpy functionality you think especially worth benchmarking (I know -- there is a lot of things which could be set to be benchmarked) -- that would be a nice starting point: just list functionality/functions you consider of primary interest. and either it is worth testing for different types or just a gross estimate (e.g. for the selection of types in a loop) As for myself -- I guess I will add fancy indexing and slicing tests. Adding them is quite easy: have a look at https://github.com/yarikoptic/numpy-vbench/blob/master/vb_reduce.py which is actually a bit more cumbersome because of running them for different types. This one is more obvious: https://github.com/yarikoptic/numpy-vbench/blob/master/vb_io.py -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Mon, 2013-05-06 at 12:11 -0400, Yaroslav Halchenko wrote:
Indexing/assignment was the first thing I thought of too (also because fancy indexing/assignment really could use some speedups...). Other then that maybe some timings for small arrays/scalar math, but that might be nice for that GSoC project. Maybe array creation functions, just to see if performance bugs should sneak into something that central. But can't think of something else that isn't specific functionality. - Sebastian
![](https://secure.gravatar.com/avatar/18a30ce6d84de6ce5c11ce006d10f616.jpg?s=120&d=mm&r=g)
On 7 May 2013 13:47, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Why not going bigger? Ufunc operations on big arrays, CPU and memory bound. Also, what about interfacing with other packages? It may increase the compiling overhead, but I would like to see Cython in action (say, only last version, maybe it can be fixed).
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
Hi Guys, not quite the recommendations you expressed, but here is my ugly attempt to improve benchmarks coverage: http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html initially I also ran those ufunc benchmarks per each dtype separately, but then resulting webpage is loong which brings my laptop on its knees by firefox. So I commented those out for now, and left only "summary" ones across multiple datatypes. There is a bug in sphinx which forbids embedding some figures for vb_random "as is", so pardon that for now... I have not set cpu affinity of the process (but ran it at nice -10), so may be that also contributed to variance of benchmark estimates. And there probably could be more of goodies (e.g. gc control etc) to borrow from https://github.com/pydata/pandas/blob/master/vb_suite/test_perf.py which I have just discovered to minimize variance. nothing really interesting was pin-pointed so far, besides that - svd became a bit faster since few months back ;-) http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html - isnan (and isinf, isfinite) got improved http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-i... - right_shift got a miniscule slowdown from what it used to be? http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-r... As before -- current code of those benchmarks collection is available at http://github.com/yarikoptic/numpy-vbench/pull/new/master if you have specific snippets you would like to benchmark -- just state them here or send a PR -- I will add them in. Cheers, On Tue, 07 May 2013, Daπid wrote:
Why not going bigger? Ufunc operations on big arrays, CPU and memory bound.
-- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
FWIW -- updated plots with contribution from Julian Taylor http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_indexing.html#mmap... ;-) On Mon, 01 Jul 2013, Yaroslav Halchenko wrote:
Hi Guys,
not quite the recommendations you expressed, but here is my ugly attempt to improve benchmarks coverage:
http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html
There is a bug in sphinx which forbids embedding some figures for vb_random "as is", so pardon that for now...
nothing really interesting was pin-pointed so far, besides that
- svd became a bit faster since few months back ;-)
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html
- isnan (and isinf, isfinite) got improved
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-i...
- right_shift got a miniscule slowdown from what it used to be?
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-r...
As before -- current code of those benchmarks collection is available at http://github.com/yarikoptic/numpy-vbench/pull/new/master
if you have specific snippets you would like to benchmark -- just state them here or send a PR -- I will add them in.
Cheers,
On Tue, 07 May 2013, Daπid wrote:
Why not going bigger? Ufunc operations on big arrays, CPU and memory bound.
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
Julian Taylor contributed some benchmarks he was "concerned" about, so now the collection is even better. I will keep updating tests on the same url: http://www.onerussian.com/tmp/numpy-vbench/ [it is now running and later I will upload with more commits for higher temporal fidelity] of particular interest for you might be: some minor consistent recent losses in http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-floa... http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-floa... http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int1... http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-int8 seems have lost more than 25% of performance throughout the timeline http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#memcpy-int8 "fast" calls to all/any seemed to be hurt twice in their life time now running *3 times slower* than in 2011 -- inflection points correspond to regressions and/or their fixes in those functions to bring back performance on "slow" cases (when array traversal is needed, e.g. on arrays of zeros for any) http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-all-fast http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-any-fast Enjoy On Mon, 01 Jul 2013, Yaroslav Halchenko wrote:
On Mon, 01 Jul 2013, Yaroslav Halchenko wrote:
Hi Guys,
not quite the recommendations you expressed, but here is my ugly attempt to improve benchmarks coverage:
http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html
There is a bug in sphinx which forbids embedding some figures for vb_random "as is", so pardon that for now...
nothing really interesting was pin-pointed so far, besides that
- svd became a bit faster since few months back ;-)
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html
- isnan (and isinf, isfinite) got improved
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-i...
- right_shift got a miniscule slowdown from what it used to be?
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-r...
As before -- current code of those benchmarks collection is available at http://github.com/yarikoptic/numpy-vbench/pull/new/master
if you have specific snippets you would like to benchmark -- just state them here or send a PR -- I will add them in.
Cheers,
On Tue, 07 May 2013, Daπid wrote:
Why not going bigger? Ufunc operations on big arrays, CPU and memory bound.
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
and to put so far reported findings into some kind of automated form, please welcome http://www.onerussian.com/tmp/numpy-vbench/#benchmarks-performance-analysis This is based on a simple 1-way anova of last 10 commits and some point in the past where 10 other commits had smallest timing and were significantly different from the last 10 commits. "Possible recent" is probably too noisy and not sure if useful -- it should point to a closest in time (to the latest commits) diff where a significant excursion from current performance was detected. So per se it has nothing to do with the initial detected performance hit, but in some cases seems still to reasonably locate commits hitting on performance. Enjoy, On Tue, 09 Jul 2013, Yaroslav Halchenko wrote:
Julian Taylor contributed some benchmarks he was "concerned" about, so now the collection is even better.
seems have lost more than 25% of performance throughout the timeline http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#memcpy-int8
http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-all-fast http://www.onerussian.com/tmp/numpy-vbench/vb_vb_reduce.html#numpy-any-fast
Enjoy
On Mon, 01 Jul 2013, Yaroslav Halchenko wrote:
On Mon, 01 Jul 2013, Yaroslav Halchenko wrote:
Hi Guys,
not quite the recommendations you expressed, but here is my ugly attempt to improve benchmarks coverage:
http://www.onerussian.com/tmp/numpy-vbench-20130701/index.html
There is a bug in sphinx which forbids embedding some figures for vb_random "as is", so pardon that for now...
nothing really interesting was pin-pointed so far, besides that
- svd became a bit faster since few months back ;-)
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_linalg.html
- isnan (and isinf, isfinite) got improved
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-i...
- right_shift got a miniscule slowdown from what it used to be?
http://www.onerussian.com/tmp/numpy-vbench-20130701/vb_vb_ufunc.html#numpy-r...
As before -- current code of those benchmarks collection is available at http://github.com/yarikoptic/numpy-vbench/pull/new/master
if you have specific snippets you would like to benchmark -- just state them here or send a PR -- I will add them in.
Cheers,
On Tue, 07 May 2013, Daπid wrote:
Why not going bigger? Ufunc operations on big arrays, CPU and memory bound.
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
I have just added a few more benchmarks, and here they come http://www.onerussian.com/tmp/numpy-vbench/vb_vb_linalg.html#numpy-linalg-pi... it seems to be very recent so my only check based on 10 commits didn't pick it up yet so they are not present in the summary table. could well be related to 80% faster det()? ;) norm was hit as well a bit earlier, might well be within these commits: https://github.com/numpy/numpy/compare/24a0aa5...29dcc54 I will rerun now benchmarking for the rest of commits (was running last in the day iirc) Cheers, On Tue, 16 Jul 2013, Yaroslav Halchenko wrote:
and to put so far reported findings into some kind of automated form, please welcome
http://www.onerussian.com/tmp/numpy-vbench/#benchmarks-performance-analysis
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
The biggest ~recent change in master's linalg was the switch to gufunc back ends - you might want to check for that event in your commit log. On 19 Jul 2013 23:08, "Yaroslav Halchenko" <lists@onerussian.com> wrote:
![](https://secure.gravatar.com/avatar/da3a0a1942fbdc5ee9a9b8115ac5dae7.jpg?s=120&d=mm&r=g)
20.07.2013 01:38, Nathaniel Smith kirjoitti:
The biggest ~recent change in master's linalg was the switch to gufunc back ends - you might want to check for that event in your commit log.
That was in mid-April, which doesn't match with the location of the uptick in the graph. Pauli
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
At some point I hope to tune up the report with an option of viewing the plot using e.g. nvd3 JS so it could be easier to pin point/analyze interactively. On Sat, 20 Jul 2013, Pauli Virtanen wrote:
That was in mid-April, which doesn't match with the location of the uptick in the graph.
-- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
On Mon, 22 Jul 2013, Benjamin Root wrote:
"that's just sick!" do you know about any motion in python-sphinx world on supporting it? is there any demo page you would recommend to assess what to expect supported in upcoming webagg? -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/09939f25b639512a537ce2c90f77f958.jpg?s=120&d=mm&r=g)
On Mon, Jul 22, 2013 at 1:28 PM, Yaroslav Halchenko <lists@onerussian.com>wrote:
Oldie but goodie: http://mdboom.github.io/blog/2012/10/11/matplotlib-in-the-browser-its-coming... Official Announcement: http://matplotlib.org/1.3.0/users/whats_new.html#webagg-backend Note, this is different than what is now available in IPython Notebook (it isn't really interactive there). As for what is supported, just about everything you can do normally, can be done in WebAgg. I have no clue about sphinx-level support. Now, back to your regularly scheduled program. Cheers! Ben Root
![](https://secure.gravatar.com/avatar/d2aafb97833979e3668c61d36e697bfc.jpg?s=120&d=mm&r=g)
On 7/19/13, Yaroslav Halchenko <lists@onerussian.com> wrote:
Well, this is embarrassing: https://github.com/numpy/numpy/pull/3539 Thanks for benchmarks! I'm now an even bigger fan. :) Warren might well be within these commits:
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
On Fri, 19 Jul 2013, Warren Weckesser wrote:
Well, this is embarrassing: https://github.com/numpy/numpy/pull/3539
Thanks for benchmarks! I'm now an even bigger fan. :)
Great to see that those came of help! I thought to provide a detailed details (benchmarking all recent commits) to provide exact point of regression, but embarrassingly I made that run outside of the benchmarking chroot, so consistency was not guaranteed. Anyways -- rerunning it correctly now (with recent commits included). -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
Added some basic constructors benchmarks: http://www.onerussian.com/tmp/numpy-vbench/vb_vb_core.html quite a bit of fresh enhancements are present (cool) but also some freshly discovered elderly hits, e.g. http://www.onerussian.com/tmp/numpy-vbench/vb_vb_core.html#numpy-identity-10... http://www.onerussian.com/tmp/numpy-vbench/vb_vb_core.html#numpy-ones-100 Cheers, On Fri, 19 Jul 2013, Yaroslav Halchenko wrote:
could well be related to 80% faster det()? ;)
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
I am glad to announce that now you can see benchmark timing plots for multiple branches, thus being able to spot regressions in maintenance branches and compare enhancements in relation to previous releases. e.g. * improving upon 1.7.x but still lacking behind 1.6.x http://www.onerussian.com/tmp/numpy-vbench/vb_vb_core.html#numpy-identity-10... http://www.onerussian.com/tmp/numpy-vbench/vb_vb_function_base.html#percenti... ... * what seems to be a regression caught/fixed in 1.7.x: http://www.onerussian.com/tmp/numpy-vbench/vb_vb_indexing.html#a-indexes-flo... * or not (yet) fixed in 1.7.x http://www.onerussian.com/tmp/numpy-vbench/vb_vb_io.html#strided-assign-comp... summary table generation is not yet adjusted for this multi-branch changes, so there might be misleading results there: http://www.onerussian.com/tmp/numpy-vbench/index.html#benchmarks-performance... Cheers, -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
FWIW -- updated runs of the benchmarks are available at http://yarikoptic.github.io/numpy-vbench which now include also maintenance/1.8.x branch (no divergences were detected yet). There are only recent improvements as I see and no new (but some old ones are still there, some might be specific to my CPU here) performance regressions. Cheers, -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
On Fri, 06 Sep 2013, Daπid wrote:
some old ones are still there, some might be specific to my CPU here
How long does one run take? Maybe I can run it in my machine (Intel i5) for comparison.
In current configuration where I "target" benchmark run to around 200ms (thus possibly jumping up to 400ms) and thus 1-2 sec for 3 actual runs to figure out min among those -- on that elderly box it takes about a day to run "end of the day" commits (iirc around 400 of them) and then 3-4 days for a full run (all commits). I am not sure if targetting 200ms is of any benefit, as opposed to 100ms which would run twice faster. you are welcome to give it a shout right away http://github.com/yarikoptic/numpy-vbench it is still a bit ad-hoc and I also use additional shell wrapper to set cpu affinity (taskset -cp 1) and renice to -10 the benchmarking process. -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Fri, Sep 6, 2013 at 3:21 PM, Yaroslav Halchenko <lists@onerussian.com> wrote:
You would have enough data to add some quality control bands to the charts (like cusum chart for example). Then it would be possible to send a congratulation note or ring an alarm bell without looking at all the plots. Josef
![](https://secure.gravatar.com/avatar/634276db962a81d2e8eb008236e8760d.jpg?s=120&d=mm&r=g)
On Fri, 06 Sep 2013, josef.pktd@gmail.com wrote:
well -- I did cook up some basic "detector" but I believe I haven't adjusted it for multiple branches yet: http://yarikoptic.github.io/numpy-vbench/#benchmarks-performance-analysis you are welcome to introduce additional (or replacement) detection goodness http://github.com/yarikoptic/vbench/blob/HEAD/vbench/analysis.py and plotting is done here I believe: https://github.com/yarikoptic/vbench/blob/HEAD/vbench/benchmark.py#L155 -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Senior Research Associate, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik
participants (9)
-
Benjamin Root
-
Charles R Harris
-
Daπid
-
josef.pktd@gmail.com
-
Nathaniel Smith
-
Pauli Virtanen
-
Sebastian Berg
-
Warren Weckesser
-
Yaroslav Halchenko