[Numpy-discussion] 2-d in-place operation performance vs 1-d non in-place

Thu Sep 6 07:30:43 EDT 2007

On Sep 5, 12:29 pm, Francesc Altet <fal... at carabos.com> wrote:
> A Wednesday 05 September 2007, George Sakkis escrigué:
>
>
>
> > I was surprised to see that an in-place modification of a 2-d array
> > turns out to be slower from the respective non-mutating operation on
> > 1- d arrays, although the latter creates new array objects. Here is
> > the benchmarking code:
>
> > import timeit
>
> > for n in 10,100,1000,10000:
> >    setup = 'from numpy.random import random;' \
> >            'm=random((%d,2));' \
> >            'u1=random(%d);' \
> >            'u2=u1.reshape((u1.size,1))' % (n,n)
> >    timers = [timeit.Timer(stmt,setup) for stmt in
> >        # 1-d operations; create new arrays
> >        'a0 = m[:,0]-u1; a1 = m[:,1]-u1',
> >        # 2-d in place operation
> >        'm -= u2'
> >    ]
> >    print n, [min(timer.repeat(3,1000)) for timer in timers]
>
> > And some results (Python 2.5, WinXP):
>
> > 10 [0.010832382327921563, 0.0045706926438974782]
> > 100 [0.010882668048592767, 0.021704993232380093]
> > 1000 [0.018272154701226007, 0.19477587235249172]
> > 10000 [0.073787590322233698, 1.9234369172618306]
>
> > So the 2-d in-place modification time grows linearly with the array
> > size but the 1-d operations are much more efficient, despite
> > allocating new arrays while doing so. What gives ?
>
> This seems the effect of broadcasting u2.  If you were to use a
> pre-computed broadcasted, you would get rid of such bottleneck:
>
> for n in 10,100,1000,10000:
>    setup = 'import numpy;' \
>            'm=numpy.random.random((%d,2));' \
>            'u1=numpy.random.random(%d);' \
>            'u2=u1[:, numpy.newaxis];' \
>            'u3=numpy.array([u1,u1]).transpose()' % (n,n)
>    timers = [timeit.Timer(stmt,setup) for stmt in
>        # 1-d operations; create new arrays
>        'a0 = m[:,0]-u1; a1 = m[:,1]-u1',
>        # 2-d in place operation (using broadcasting)
>        'm -= u2',
>        # 2-d in-place operation (not forcing broadcasting)
>        'm -= u3'
>    ]
>    print n, [min(timer.repeat(3,1000)) for timer in timers]
>
> gives in my machine:
>
> 10 [0.03213191032409668, 0.012019872665405273, 0.0068600177764892578]
> 100 [0.033048152923583984, 0.06542205810546875, 0.0076580047607421875]
> 1000 [0.040294170379638672, 0.59892702102661133, 0.014600992202758789]
> 10000 [0.32667303085327148, 5.9721651077270508, 0.10261106491088867]

Thank you, indeed broadcasting is the bottleneck here. I had the
impression that broadcasting was a fast operation, i.e. it doesn't
require allocating physically the target array of the broadcast but it
seems this is not the case.

George