[Numpy-discussion] Rookie problems - Why is C-code much faster?

Tue Feb 21 14:24:05 EST 2006

It turns out that  around (or round_) is implemented in python:

def round_(a, decimals=0):
    """Round 'a' to the given number of decimal places.  Rounding
    behaviour is equivalent to Python.

    Return 'a' if the array is not floating point.  Round both the real
    and imaginary parts separately if the array is complex.
    """
    a = asarray(a)
    if not issubclass(a.dtype.type, _nx.inexact):
        return a
    if issubclass(a.dtype.type, _nx.complexfloating):
        return round_(a.real, decimals) + 1j*round_(a.imag, decimals)
    if decimals is not 0:
        decimals = asarray(decimals)
    s = sign(a)
    if decimals is not 0:
        a = absolute(multiply(a, 10.**decimals))
    else:
        a = absolute(a)
    rem = a-asarray(a).astype(_nx.intp)
    a = _nx.where(_nx.less(rem, 0.5), _nx.floor(a), _nx.ceil(a))
    # convert back
    if decimals is not 0:
        return multiply(a, s/(10.**decimals))
    else:
        return multiply(a, s)

I see many ways to improve the performance here.  First, there is no
need to check for "decimals is not 0" three times. This can be done
once, maybe at the expense of some code duplication.  Second,
_nx.where(_nx.less(rem, 0.5), _nx.floor(a), _nx.ceil(a)) seems to be
equivalent to _nx.floor(a+0.5). Finally, if rint is implemented as a
ufunc as Mads originally suggested, "decimals is 0" branch can just
call that.

It is tempting to rewrite the whole thing in C, but before I  do that
I have a few questions about current implementation.

1.  It is implemented in oldnumeric.py .  Does this mean it is
deprecated.  If so, what is the recommended replacement?

2.  Was it intended to support  array and fractional values for
decimals or is it an implementation artifact. Currently:

>>> around(array([1.2345]*5),[1,2,3,4,5])
array([ 1.2   ,  1.23  ,  1.235 ,  1.2345,  1.2345])
>>> around(1.2345,2.5)
array(1.2332882874656679)

3. It does nothing to exact types, even if decimals<0
>>> around(1234, -2)
array(1234)

Is this a bug? Consider that
>>> round(1234, -2)
1200.0
and
>>> around(1234., -2)
array(1200.0)

Docstring is self-contradictory: "Rounding behaviour is equivalent to
Python" is not consistent with "Return 'a' if the array is not
floating point."

I propose to deprecate around and implement a new "round" member
function in C that will only accept scalar "decimals"  and will behave
like a properly vectorized builtin round.  I will do the coding if
there is interest.

In any case, something has to be done here.  I don't think the
following timings are acceptable:

> python -m timeit -s "from numpy import array; x = array([1.5]*1000)" "(x+0.5).astype(int).astype(float)"
100000 loops, best of 3: 18.8 usec per loop
> python -m timeit -s "from numpy import array, around; x = array([1.5]*1000)" "around(x)"
10000 loops, best of 3: 155 usec per loop

On 2/21/06, Sasha <ndarray at mac.com> wrote:
> On the second thought, the difference between around and astype is not
> surprising because around operates in terms of decimals.  Rather than
> adding rint, I would suggest to make a special case decimals=0 use C
> rint.
>
> > On 2/21/06, Mads Ipsen <mpi at osc.kiku.dk> wrote:
> > > I suggest that rint() is added as a ufunc or is there any concerns
> > > here that I am not aware of?
> >
> > You might want to use astype(int).  On my system it is much faster than around:
> >
> > > python -m timeit -s "from numpy import array, around; x = array([1.5]*1000)" "around(x)"
> > 10000 loops, best of 3: 176 usec per loop
> > > python -m timeit -s "from numpy import array, around; x = array([1.5]*1000)" "x.astype(int)"
> > 100000 loops, best of 3: 3.2 usec per loop
> >
> > the difference is too big to be explained by the fact that around
> > allocates twice as much memory for the result.  In fact the following
> > equivalent of rint is still very fast:
> >
> > > python -m timeit -s "from numpy import array, around; x = array([1.5]*1000)" "x.astype(int).astype(float)"
> > 100000 loops, best of 3: 6.48 usec per loop
> >
>