I want to point out, that Intel provides a very interesting OSS compiler based on LLVM that targets vectorized code for SSE, AVX instructions on x86 and x86_64 platforms. Carl https://github.com/ispc/ispc/ http://ispc.github.io/ quote: ispc is a compiler for a variant of the C programming language, with extensions for "single program, multiple data" (SPMD) programming. Under the SPMD model, the programmer writes a program that generally appears to be a regular serial program, though the execution model is actually that a number of program instances execute in parallel on the hardware. (See the ispc documentation for more details and examples that illustrate this concept.) ... ispc is an open source compiler with a BSD license. It uses the remarkable LLVM Compiler Infrastructure for back-end code generation and optimization and is hosted on github. It supports Windows, Mac, and Linux, with both x86 and x86-64 targets. It currently supports the SSE2, SSE4, AVX1, AVX2, and Xeon Phi "Knight's Corner" instruction sets. 2014-03-03 22:51 GMT+01:00 Ralf Gommers <ralf.gommers@gmail.com>:
On Mon, Mar 3, 2014 at 8:20 PM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
hi,
as the numpy gsoc topic page is a little short on options I was thinking about adding two topics for interested students. But as I have no experience with gsoc or mentoring and the ideas are not very fleshed out yet I'd like to ask if it might make sense at all:
1. configurable algorithm precision
some functions in numpy could be implemented in different ways depending on requirements on speed and numerical precision. Two examples that come to my mind are hypot and sum
hypot (or abs(complex)) use the C99 hypot function which guarantees 1ulp precision but is very slow compared to a simple sqrt(a**2 +b**2). This precision might not be required for all applications, overflow safety might be enough.
summation in numpy 1.9 is performed via pairwise summation which has O(log(n)*e) error properties, but only for the fast axis. An alternative O(e) approach would be kahan summation which works for all axis but is 4 time slower than normal summation (a bit can be regained via vectorization thought)
My idea is have an option to change the algorithms used in numpy depending on the set requirements. E.g.
with np.precmode(default="fast"): np.abs(complex_array)
or fast everything except sum and hypot
with np.precmode(default="fast", sum="kahan", hypot="standard"): np.sum(d)
I have not though much about implementation, it might be tricky to get this threadsafe in the current ufunc model.
2. vector math library integration
some operations like powers, sin, cos etc are relatively slow in numpy depending on the c library used. There are now a few free libraries available that make use of modern hardware to speed these operations up, e.g. sleef and yeppp (also mkl but I have no interest in supporting non-free software) It might be interesting to investigate if these libraries can be integrated with numpy. This also somewhat ties in with the configurable precision mode as the vector math libraries often have different options depending on precision and speed requirements.
Do those sound like topics we could add to our wiki?
To me (2) sounds potentially very interesting, and definitely enough for a GSoC. Would need a very talented student to take this on.
(1) I'm not sure about, somehow just doesn't sound like it would have a big impact. Also maybe not enough work to fill 3+ months with?
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion