Have you tried on an Intel CPU? I have both a i5 quad core and an i7 octo core where I could run it over the weekend. One may expect some compiler magic taking advantage of the advanced features, specially the i7.

btw -- fresh results are here http://yarikoptic.github.io/numpy-vbench/ .

I have tuned benchmarking so it now reflects the best performance across

multiple executions of the whole battery, thus eliminating spurious

variance if estimate is provided from a single point in time. Eventually I

expect many of those curves to become even "cleaner".

On another note, what do you think of moving the vbench benchmarks

into the main numpy tree? We already require everyone who submits a

bug fix to add a test; there are a bunch of speed enhancements coming

in these days and it would be nice if we had some way to ask people to

submit a benchmark along with each one so that we know that the

enhancement stays enhanced...

On this positive note (it is boring to start a new thread, isn't it?) --

would you be interested in me transfering numpy-vbench over to

github.com/numpy ?

as of today, plots on http://yarikoptic.github.io/numpy-vbench should

be updating 24x7 (just a loop, thus no time guarantee after you submit

new changes).

>

Besides benchmarking new benchmarks (your PRs would still be very

welcome, so far it was just me and Julian T) and revisions, that

process also goes through a random sample of existing previously

benchmarked revisions and re-runs the benchmarks thus improving upon the

ultimate 'min' timing performance. So you can see already that many

plots became much 'cleaner', although now there might be a bit of bias

in estimates for recent revisions since they hadn't accumulated yet as

many of 'independent runs' as older revisions.

using the vbench I created a comparison of gcc and clang with different

options.

Cliffnotes:

* gcc -O2 performs 5-10% better than -O3 in most benchmarks, except in a

few select cases where the vectorizer does its magic

* gcc and clang are very close in performance, but the cases where a

compiler wins by a large margin its mostly gcc that wins

I have collected some interesting plots on this notebook:

http://nbviewer.ipython.org/7646615

