[Numpy-discussion] Why is numpy.abs so much slower on complex64 than complex128 under windows 32-bit?

Tue Apr 10 12:57:04 EDT 2012

On 4/10/12 9:55 AM, Henry Gomersall wrote:
> On 10/04/2012 16:36, Francesc Alted wrote:
>> In [10]: timeit c = numpy.complex64(numpy.abs(numpy.complex128(b)))
>> 100 loops, best of 3: 12.3 ms per loop
>>
>> In [11]: timeit c = numpy.abs(b)
>> 100 loops, best of 3: 8.45 ms per loop
>>
>> in your windows box and see if they raise similar results?
>>
> No, the results are somewhat the same as before - ~40ms for the first
> (upcast/downcast) case and ~150ms for the direct case (both *much*
> slower than yours!). This is versus ~28ms for operating directly on
> double precisions.

Okay, so it seems that something is going on wrong with the performance 
of pure complex64 abs() for Windows.

>
> I'm using numexpr in the end, but this is slower than numpy.abs under linux.

Oh, you mean the windows version of abs(complex64) in numexpr is slower 
than a pure numpy.abs(complex64) under linux?  That's weird, because 
numexpr has an independent implementation of the complex operations from 
NumPy machinery.  Here it is how abs() is implemented in numexpr:

static void
nc_abs(cdouble *x, cdouble *r)
{
     r->real = sqrt(x->real*x->real + x->imag*x->imag);
     r->imag = 0;
}

[as I said, only the double precision version is implemented, so you 
have to add here the cost of the cast too]

Hmm, considering all of these facts, it might be that sqrtf() on windows 
is under-performing?  Can you try this:

In [68]: a = numpy.linspace(0, 1, 1e6)

In [69]: b = numpy.float32(a)

In [70]: timeit c = numpy.sqrt(a)
100 loops, best of 3: 5.64 ms per loop

In [71]: timeit c = numpy.sqrt(b)
100 loops, best of 3: 3.77 ms per loop

and tell us the results for windows?

PD: if you are using numexpr on windows, you may want to use the MKL 
linked version, which uses the abs of MKL, that should have considerably 
better performance.

-- 
Francesc Alted