[Numpy-discussion] understanding buffering done when broadcasting

Tue Nov 24 19:49:44 EST 2015

On Mon, Nov 23, 2015 at 2:09 PM, Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Mo, 2015-11-23 at 13:31 -0800, Eli Bendersky wrote:
> > Hello,
> >
> >
> > I'm trying to understand the buffering done by the Numpy iterator
> > interface (the new post 1.6-one) when running ufuncs on arrays that
> > require broadcasting. Consider this simple case:
> >
> > In [35]: m = np.arange(16).reshape(4,4)
> > In [37]: n = np.arange(4)
> > In [39]: m + n
> > Out[39]:
> > array([[ 0,  2,  4,  6],
> >        [ 4,  6,  8, 10],
> >        [ 8, 10, 12, 14],
> >        [12, 14, 16, 18]])
> >
> >
>
>
> On first sight this seems true. However, there is one other point to
> consider here. The inner ufunc loop can only handle a single stride. The
> contiguous array `n` has to be iterated as if it had the strides
> `(0, 8)`, which is not the strides of the contiguous array `m` which can
> be "unrolled" to 1-D. Those effective strides are thus not contiguous
> for the inner ufunc loop and cannot be unrolled into a single ufunc
> call!
>

> The optimization (which might kick in a bit more broadly maybe), is thus
> that the number of inner loop calls is minimized, whether that is worth
> it, I am not sure, it may well be that there is some worthy optimization
> possible here.
> Note however, that this does not occur for large inner loop sizes
> (though I think you can find some "bad" sizes):
>
> ```
> In [18]: n = np.arange(40000)
>
> In [19]: m = np.arange(160000).reshape(4,40000)
>
> In [20]: o = m + n
> Iterator: Checking casting for operand 0
> op: dtype('int64'), iter: dtype('int64')
> Iterator: Checking casting for operand 1
> op: dtype('int64'), iter: dtype('int64')
> Iterator: Checking casting for operand 2
> op: <null>, iter: dtype('int64')
> Iterator: Setting allocated stride 1 for iterator dimension 0 to 8
> Iterator: Setting allocated stride 0 for iterator dimension 1 to 320000
> Iterator: Copying inputs to buffers
> Iterator: Expanding inner loop size from 8192 to 40000 since buffering
> wasn't needed
> Any buffering needed: 0
> Iterator: Finished copying inputs to buffers (buffered size is 40000)
> ```
>

The heuristic in the code says that if we can use a single stride and
that's larger than the buffer size (which I assume is the default buffer
size, and can change) then it's "is_onestride" and no buffering is done.

So this led me to explore around this threshold (8192 items by default on
my machine), and indeed we can notice funny behavior:

In [51]: %%timeit n = arange(8192); m = np.arange(8192*40).reshape(40,8192)
o = m + n
   ....:
1000 loops, best of 3: 274 µs per loop

In [52]: %%timeit n = arange(8292); m = np.arange(8292*40).reshape(40,8292)
o = m + n
   ....:
1000 loops, best of 3: 229 µs per loop

So, given this, it's not very clear why the "optimization" kicks in.
Buffering for small sizes seems like a mistake.

Eli

>
> Anyway, feel free to have a look ;). The code is not the most read one
> in NumPy, and it would not surprise me a lot if you can find something
> to tweak.
>
> - Sebastian
>
>
> >
> > If I instrument Numpy (setting NPY_IT_DBG_TRACING and such), I see
> > that when the add() ufunc is called, 'n' is copied into a temporary
> > buffer by the iterator. The ufunc then gets the buffer as its data.
> >
> >
> > My question is: why is this buffering needed? It seems wasteful, since
> > no casting is required here, no special alignment problems and also
> > 'n' is contiguously laid out in memory. It seems that it would be more
> > efficient to just use 'n' in the ufunc instead of passing in the
> > buffer. What am I missing?
> >
> >
> > Thanks in advance,
> >
> > Eli
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151124/b3e5506c/attachment.html>