
Hi,
This morning I was wondering whether we ought to plan to devote some resources to collaborating with the OpenBLAS team.
Summary: we should explore ways of setting up numpy as a test engine for OpenBLAS development.
Detail:
I am getting the impression that OpenBLAS is looking like the most likely medium term solution for open-source stack builds of numpy and scipy on Linux and Windows at least.
ATLAS has been our choice for this up until now, but it is designed for optimizing to a particular CPU configuration, which will likely make it slow on some or even most of the machines a general installer gets installed on. This is only likely to get more severe over time, because current ATLAS development is on multi-core optimization, where the number of cores may need to be set at compile time.
The worry about OpenBLAS has always been that it is hard to maintain, and fixes don't always have tests. There might be other alternatives that are a better bet technically, but don't currently have OpenBLAS' dynamic selection features or CPU support.
It is relatively easy to add tests using Python / numpy. We like tests. Why don't we propose a collaboration with OpenBLAS where we build and test numpy with every / most / some commits of OpenBLAS, and try to make it easy for the OpenBLAS team to add tests. Maybe we can use and add to the list of machines on which OpenBLAS is tested [1]? We Berkeley Pythonistas can certainly add the machines at our buildbot farm [2]. Maybe the Julia / R developers would be interested to help too?
Cheers,
Matthew
[1] https://github.com/xianyi/OpenBLAS/wiki/Machine-List [2] http://nipy.bic.berkeley.edu/buildslaves

On 05/26/2015 04:56 PM, Matthew Brett wrote:
Hi,
This morning I was wondering whether we ought to plan to devote some resources to collaborating with the OpenBLAS team.
It is relatively easy to add tests using Python / numpy. We like tests. Why don't we propose a collaboration with OpenBLAS where we build and test numpy with every / most / some commits of OpenBLAS, and try to make it easy for the OpenBLAS team to add tests. Maybe we can use and add to the list of machines on which OpenBLAS is tested [1]? We Berkeley Pythonistas can certainly add the machines at our buildbot farm [2]. Maybe the Julia / R developers would be interested to help too?
Technically we only need a single machine with the newest instruction set available. All other cases could then be tested via a virtual machine that only exposes specific instruction sets (e.g. qemu which could technically also emulate stuff the host does not have).
Concerning test generation there is a huge parameter space that needs testing due with openblas, at least some of it would need to be automated/fuzzed. We also need specific preconditioning of memory to test failure cases openblas had in the past, E.g. filling memory around the matrices with nans and also somehow filling openblas own temporary buffers with some signaling values (might require special built openblas if _MALLOC_PERTURB does not work).
Maybe it would be feasible to write a hypothesis [0] strategy for some of the blas stuff to automate the parameter exploration.
And then we'd need to run everything under valgrind as due to the assembler implementation of openblas we can't use the faster address sanitizers gcc and clang now provide.

Hi,
On Tue, May 26, 2015 at 12:53 PM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 05/26/2015 04:56 PM, Matthew Brett wrote:
Hi,
This morning I was wondering whether we ought to plan to devote some resources to collaborating with the OpenBLAS team.
It is relatively easy to add tests using Python / numpy. We like tests. Why don't we propose a collaboration with OpenBLAS where we build and test numpy with every / most / some commits of OpenBLAS, and try to make it easy for the OpenBLAS team to add tests. Maybe we can use and add to the list of machines on which OpenBLAS is tested [1]? We Berkeley Pythonistas can certainly add the machines at our buildbot farm [2]. Maybe the Julia / R developers would be interested to help too?
Technically we only need a single machine with the newest instruction set available. All other cases could then be tested via a virtual machine that only exposes specific instruction sets (e.g. qemu which could technically also emulate stuff the host does not have).
Concerning test generation there is a huge parameter space that needs testing due with openblas, at least some of it would need to be automated/fuzzed. We also need specific preconditioning of memory to test failure cases openblas had in the past, E.g. filling memory around the matrices with nans and also somehow filling openblas own temporary buffers with some signaling values (might require special built openblas if _MALLOC_PERTURB does not work).
Maybe it would be feasible to write a hypothesis [0] strategy for some of the blas stuff to automate the parameter exploration.
And then we'd need to run everything under valgrind as due to the assembler implementation of openblas we can't use the faster address sanitizers gcc and clang now provide.
All this sounds extremely useful.
What do you think we should do next? How feasible is it to start to set this kind of thing up for our own use, and then offer to integrate with OpenBLAS?
Is there anyone out there who knows the Julia and / or R community well enough to know if they would be interested to help? What kind of help do you think we need? Money for a machine?
Cheers,
Matthew

On Tue, May 26, 2015 at 9:53 AM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 05/26/2015 04:56 PM, Matthew Brett wrote:
Hi,
This morning I was wondering whether we ought to plan to devote some resources to collaborating with the OpenBLAS team.
It is relatively easy to add tests using Python / numpy. We like tests. Why don't we propose a collaboration with OpenBLAS where we build and test numpy with every / most / some commits of OpenBLAS, and try to make it easy for the OpenBLAS team to add tests. Maybe we can use and add to the list of machines on which OpenBLAS is tested [1]? We Berkeley Pythonistas can certainly add the machines at our buildbot farm [2]. Maybe the Julia / R developers would be interested to help too?
Technically we only need a single machine with the newest instruction set available. All other cases could then be tested via a virtual machine that only exposes specific instruction sets (e.g. qemu which could technically also emulate stuff the host does not have).
Concerning test generation there is a huge parameter space that needs testing due with openblas, at least some of it would need to be automated/fuzzed. We also need specific preconditioning of memory to test failure cases openblas had in the past, E.g. filling memory around the matrices with nans and also somehow filling openblas own temporary buffers with some signaling values (might require special built openblas if _MALLOC_PERTURB does not work).
A lot of this stuff is easier if we take a white-box instead of black-box approach -- adding hooks in OpenBLAS to override the CPU-based kernel-autoselection sounds a lot easier than creating unnatural machines in qemu, and similarly for initializing temporary buffers. (I would be really unsurprised if OpenBLAS re-uses temporary buffers across calls instead of doing a free/re-malloc, for example.)
Maybe it would be feasible to write a hypothesis [0] strategy for some of the blas stuff to automate the parameter exploration.
Or if this is daunting, you can get pretty far just sitting down and writing some for loops... I think this is a case where something is a lot better than nothing :-).
-n

2015-05-27 10:13 GMT+02:00 Nathaniel Smith njs@pobox.com:
On Tue, May 26, 2015 at 9:53 AM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 05/26/2015 04:56 PM, Matthew Brett wrote:
Hi,
This morning I was wondering whether we ought to plan to devote some resources to collaborating with the OpenBLAS team.
It is relatively easy to add tests using Python / numpy. We like tests. Why don't we propose a collaboration with OpenBLAS where we build and test numpy with every / most / some commits of OpenBLAS, and try to make it easy for the OpenBLAS team to add tests. Maybe we can use and add to the list of machines on which OpenBLAS is tested [1]? We Berkeley Pythonistas can certainly add the machines at our buildbot farm [2]. Maybe the Julia / R developers would be interested to help too?
Technically we only need a single machine with the newest instruction set available. All other cases could then be tested via a virtual machine that only exposes specific instruction sets (e.g. qemu which could technically also emulate stuff the host does not have).
Concerning test generation there is a huge parameter space that needs testing due with openblas, at least some of it would need to be automated/fuzzed. We also need specific preconditioning of memory to test failure cases openblas had in the past, E.g. filling memory around the matrices with nans and also somehow filling openblas own temporary buffers with some signaling values (might require special built openblas if _MALLOC_PERTURB does not work).
A lot of this stuff is easier if we take a white-box instead of black-box approach -- adding hooks in OpenBLAS to override the CPU-based kernel-autoselection sounds a lot easier than creating unnatural machines in qemu, and similarly for initializing temporary buffers. (I would be really unsurprised if OpenBLAS re-uses temporary buffers across calls instead of doing a free/re-malloc, for example.)
Manually overwriting the OpenBLAS CPU autoselection can easily be done by
setting the OPENBLAS_CORETYPE environment variable, i.e. export OPENBLAS_CORETYPE=Nehalem
Maybe it would be feasible to write a hypothesis [0] strategy for some of the blas stuff to automate the parameter exploration.
Or if this is daunting, you can get pretty far just sitting down and writing some for loops... I think this is a case where something is a lot better than nothing :-).
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

2015-05-27 10:26 GMT+02:00 Carl Kleffner cmkleffner@gmail.com:
2015-05-27 10:13 GMT+02:00 Nathaniel Smith njs@pobox.com:
On Tue, May 26, 2015 at 9:53 AM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 05/26/2015 04:56 PM, Matthew Brett wrote:
Hi,
This morning I was wondering whether we ought to plan to devote some resources to collaborating with the OpenBLAS team.
It is relatively easy to add tests using Python / numpy. We like tests. Why don't we propose a collaboration with OpenBLAS where we build and test numpy with every / most / some commits of OpenBLAS, and try to make it easy for the OpenBLAS team to add tests. Maybe we can use and add to the list of machines on which OpenBLAS is tested [1]? We Berkeley Pythonistas can certainly add the machines at our buildbot farm [2]. Maybe the Julia / R developers would be interested to help too?
Some benchmark results made by @wernsaar can be found at
http://sourceforge.net/p/slurm-roll/code/HEAD/tree/branches/benchmark/ . I guess this was made on Linux, so it cannot directly applied to Windows. See i.e https://github.com/xianyi/OpenBLAS/issues/532. In general OpenBLAS development trunk runs smoothly on Windows now.
Technically we only need a single machine with the newest instruction
set available. All other cases could then be tested via a virtual machine that only exposes specific instruction sets (e.g. qemu which could technically also emulate stuff the host does not have).
Concerning test generation there is a huge parameter space that needs testing due with openblas, at least some of it would need to be automated/fuzzed. We also need specific preconditioning of memory to test failure cases openblas had in the past, E.g. filling memory around the matrices with nans and also somehow filling openblas own temporary buffers with some signaling values (might require special built openblas if _MALLOC_PERTURB does not work).
A lot of this stuff is easier if we take a white-box instead of black-box approach -- adding hooks in OpenBLAS to override the CPU-based kernel-autoselection sounds a lot easier than creating unnatural machines in qemu, and similarly for initializing temporary buffers. (I would be really unsurprised if OpenBLAS re-uses temporary buffers across calls instead of doing a free/re-malloc, for example.)
Manually overwriting the OpenBLAS CPU autoselection can easily be done by
setting the OPENBLAS_CORETYPE environment variable, i.e. export OPENBLAS_CORETYPE=Nehalem
Maybe it would be feasible to write a hypothesis [0] strategy for some of the blas stuff to automate the parameter exploration.
Or if this is daunting, you can get pretty far just sitting down and writing some for loops... I think this is a case where something is a lot better than nothing :-).
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Matthew Brett matthew.brett@gmail.com wrote:
I am getting the impression that OpenBLAS is looking like the most likely medium term solution for open-source stack builds of numpy and scipy on Linux and Windows at least.
I think you right.
OpenBLAS might even be a long-term solution. We should also consider that GotoBLAS (and GotoBLAS2) powered some of the World's most expensive superomputers for a decade. It is not like this is untested software.
The remaining test errors on Windows are also due to MSVC and MinGW-w64 differences, not due to OpenBLAS itself, and those are not relevant on Linux.
On Apple, I am not sure which is better. Accelerate is faster in some corner cases (level-1 BLAS with AVX, operations on very small matrices), but it has issues with multiprocessing (GCD's threadpool is not forksafe). Apart from that OpenBLAS and Accelerate are about equivalent in performance. I have built OpenBLAS on OSX with clang and gfortran, it works like a charm. So it might be worth cobsidering for binary wheels on OSX as well.
Sturla

On Tue, May 26, 2015 at 7:56 AM, Matthew Brett matthew.brett@gmail.com wrote:
Hi,
This morning I was wondering whether we ought to plan to devote some resources to collaborating with the OpenBLAS team.
Sounds like a great idea to me. Even a bit familiar :-) http://thread.gmane.org/gmane.comp.python.numeric.general/57498
The lead developers of both OpenBLAS and BLIS are currently at UT Austin: http://shpc.ices.utexas.edu/people.html ...and it turns out that this is also where SciPy will be held in July. Might be a good opportunity for numpy/scipy folks interested in these matters to sit down in the same room as them and hash out some kind of shared plan of action.
(NB: I'm told that BLIS now has full multi-threading support, and that they are working on runtime CPU detection and kernel auto-selection right now.)
-n

Hi,
On 5/27/15, Nathaniel Smith njs@pobox.com wrote:
On Tue, May 26, 2015 at 7:56 AM, Matthew Brett matthew.brett@gmail.com wrote:
Hi,
This morning I was wondering whether we ought to plan to devote some resources to collaborating with the OpenBLAS team.
Sounds like a great idea to me. Even a bit familiar :-) http://thread.gmane.org/gmane.comp.python.numeric.general/57498
I had forgotten that thread, thanks for reminding me. I guess my idea arose from my forgotten memory of that thread , but I was thinking that it may be less of a burden, and allow more sharing of work, if we concentrate on testing. For example, if we have a testing repo on the OpenBLAS org, I can imagine Julia / R developers finding bugs, and adding tests to the Python repo, because the machinery to do that is already built and documented (in my perfect world).
The lead developers of both OpenBLAS and BLIS are currently at UT Austin: http://shpc.ices.utexas.edu/people.html ...and it turns out that this is also where SciPy will be held in July. Might be a good opportunity for numpy/scipy folks interested in these matters to sit down in the same room as them and hash out some kind of shared plan of action.
I'm afraid I'm not going to Scipy this year. Nathaniel - would you consider organizing something like this, with able help from those of us going and not going who can contribute some time?
(NB: I'm told that BLIS now has full multi-threading support, and that they are working on runtime CPU detection and kernel auto-selection right now.)
I can well imagine that BLIS will be a good option at some point, but I'm guessing that it is unlikely we will be able to to use BLIS for our default BLAS / LAPACK library on Linux / Windows / Mac in the near future. Is that right?
Cheers,
Matthew
participants (5)
-
Carl Kleffner
-
Julian Taylor
-
Matthew Brett
-
Nathaniel Smith
-
Sturla Molden