[Numpy-discussion] ANN: NumExpr3 Alpha

Fri Feb 17 11:42:09 EST 2017

Hi David,

Thanks for your comments, reply below the fold.

On Fri, Feb 17, 2017 at 4:34 PM, Daπid <davidmenhur at gmail.com> wrote:

> This is very nice indeed!
>
> On 17 February 2017 at 12:15, Robert McLeod <robbmcleod at gmail.com> wrote:
> > * bytes and unicode support
> > * reductions (mean, sum, prod, std)
>
> I use both a lot, maybe I can help you get them working.
>
> Also, regarding "Vectorization hasn't been done yet with cmath
> functions for real numbers (such as sqrt(), exp(), etc.), only for
> complex functions". What is the bottleneck? Is it in GCC or just
> someone has to sit down and adapt it?

I just haven't done it yet.  Basically I'm moving from Switzerland to
Canada in a week so this was the gap to push something out that's usable if
not perfect. Rather I just import cmath functions, which are inlined but I
suspect what's needed is to break them down into their components. For
example, the complex arccos function looks like this:

static void
nc_acos( npy_intp n, npy_complex64 *x, npy_complex64 *r)
{
    npy_complex64 a;
    for( npy_intp I = 0; I < n; I++ ) {
        a = x[I];
        _inline_mul( x[I], x[I], r[I] );
        _inline_sub( Z_1, r[I], r[I] );
        _inline_sqrt( r[I], r[I] );
        _inline_muli( r[I], r[I] );
        _inline_add( a, r[I], r[I] );
        _inline_log( r[I] , r[I] );
        _inline_muli( r[I], r[I] );
        _inline_neg( r[I], r[I]);
    }
}

I haven't sat down and inspected whether the cmath versions get vectorized,
but there's not a huge speed difference between NE2 and 3 for such a
function on float (but their is for complex), so my suspicion is they
aren't.  Another option would be to add a library such as Yeppp! as
LIB_YEPPP or some other library that's faster than glib.  For example the
glib function "fma(a,b,c)" is slower than doing "a*b+c" in NE3, and that's
not how it should be.  Yeppp is also built with Python generating C code,
so it could either be very easy or very hard.

On bytes and unicode, I haven't seen examples for how people use it, so I'm
not sure where to start. Since there's practically not a limitation on the
number of operations now (the library is 1.3 MB now, compared to 1.2 MB for
NE2 with gcc 5.4) the string functions could grow significantly from what
we have in NE2.

With regards to reductions, NumExpr never multi-threaded them, and could
only do outer reductions, so in the end there was no speed advantage to be
had compared to having NumPy do them on the result.  I suspect the primary
value there was in PyTables and Pandas where the expression had to do
everything.  One of the things I've moved away from in NE3 is doing output
buffering (rather it pre-allocates the output array), so for reductions the
understanding NumExpr has of broadcasting would have to be deeper.

In any event contributions would certainly be welcome.

Robert

-- 
Robert McLeod, Ph.D.
Center for Cellular Imaging and Nano Analytics (C-CINA)
Biozentrum der Universität Basel
Mattenstrasse 26, 4058 Basel
Work: +41.061.387.3225 <061%20387%2032%2025>
robert.mcleod at unibas.ch
robert.mcleod at bsse.ethz.ch <robert.mcleod at ethz.ch>
robbmcleod at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170217/1898e625/attachment.html>