[Numpy-discussion] -ffast-math

Julian Taylor jtaylor.debian at googlemail.com
Sun Dec 1 16:30:35 EST 2013


On 01.12.2013 21:53, Dan Goodman wrote:
> Julian Taylor <jtaylor.debian <at> googlemail.com> writes:
>> can you show the code that is slow in numpy?
>> which version of gcc and libc are you using?
>> with gcc 4.8 it uses the glibc 2.17 sin/cos with fast-math, so there
>> should be no difference.
> 
> In trying to write some simple code to demonstrate it, I realised it was
> weirdly more complicated than I thought. Previously I had been comparing
> numpy against weave on a complicated expression, namely a*sin(2.0*freq*pi*t)
> + b + v*exp(-dt/tau) + (-a*sin(2.0*freq*pi*t) - b)*exp(-dt/tau). Doing that
> with weave and no -ffast-math took the same time as numpy approximately, but
> with weave and -ffast-math it was about 30x faster. Here only a and v are
> arrays. Since numpy and weave with no -ffast-math took about the same time I
> assumed it wasn't memory bound but to do with the -ffast-math.
> 

this should be the code:

int N = _N;
for(int _idx=0; _idx<N; _idx++)
{
    double a = _array_neurongroup_a[_idx];
    double v = _array_neurongroup_v[_idx];
    double _v = a*sin(2.0*freq*pi*t) + b + v*exp(-dt/tau) +
(-a*sin(2.0*freq*pi*t) - b)*exp(-dt/tau);
    v = _v;
    _array_neurongroup_v[_idx] = v;
}


your sin and exp calls are loop invariants, they do not depend on the
loop iterable.
This allows to move the expensive functions out of the loop and only
leave some simple arithmetic in the body.

Unfortunately ieee754 floating point (gcc's default mode) does not allow
this type of transformation, they are not associative, you have special
values to propagate and sticky exceptions to preserve, set errno, etc.
All this prevents gcc from doing this in its default mode.
-ffast-math tells it to ignore all these things and just make it fast,
so it will do the loop invariant transformation in this case.

In this case setting -fno-math-errno, which disables taking care of
setting errno as the C standard requires, seems to be enough.

In pure numpy you have to do these types of transformations yourself as
cpython has no optimizer which does this type of loop invariant
optimizations.



More information about the NumPy-Discussion mailing list