[Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs

Sebastian Berg sebastian at sipsolutions.net
Thu May 31 09:53:51 EDT 2018


<snip>


> > 
> > I'm currently -0.5 on both fixed dimensions and this broadcasting
> > dimension idea. My reasoning is:
> > 
> > - The use cases seem fairly esoteric. For fixed dimensions, I guess
> > the motivating example is cross-product (are there any others?).
> > But
> > would it be so bad for a cross-product gufunc to raise an error if
> > it
> > receives the wrong number of dimensions? For this broadcasting
> > case...
> > well, obviously we've survived this long without all_equal :-). And
> > there's something funny about all_equal, since it's really smushing
> > together two conceptually separate gufuncs for efficiency. Should
> > we
> > also have all_less_than, sum_square, ...? If this is a big problem,
> > then wouldn't it be better to solve it in a general way, like dask
> > or
> > Numba or numexpr do? To be clear, I'm not saying these features are
> > necessarily *bad* ideas, in isolation -- just that the benefits
> > aren't
> > very convincing, and there are trade-offs, like:
> 
> I have often wished numpy had these short-circuiting gufuncs, for a
> very 
> long time. I specifically remember my fruitless searches for how to
> do 
> it back to 2007.
> 
> While "on average" short-circuiting only gives a speedup of 2x, in
> many 
> situations you can arrange your algorithm so short circuiting will 
> happen early, eg usually in the first 10 elements of a 10^6 element 
> array, giving enormous speedups.

> Also, I do not imagine these as free-floating ufuncs, I think we can 
> arrange them in a logical way in a gufunc ecosystem. There would be
> some 
> "core ufuncs", with "associated gufuncs" accessible as attributes.
> For 
> instance, any_less_than will be accessible as less.any
> 

So then, why is it a gufunc and not an attribute using a ufunc with
binary output? I have asked this before, and even got arguments as to
why it fits gufuncs better, but frankly I still do not really
understand.

If it is an associated gufunc, why gufunc at all? We need any() and
all() here, so that is not that many methods, right? And when it comes
to buffering you have much more flexibility.

Say I have the operation:

(float_arr > int_arr).all(axis=(1, 2))

With int_arr being shaped (2, 1000, 1000) (i.e. large along the
interesting axes). A normal gufunc IIRC will get the whole inner
dimension as a float buffer. In other words, you gain practically
nothing, because the whole int_arr will be cast to float anyway.

If, however, you actually implement np.greater_than.all(float_arr,
int_arr, axis=(1, 2)) as a separate ufunc method, you would have the
freedom to work in the typical cache friendly buffersize chunk size for
each of the outer dimensions one at a time. A gufunc would require to
say: please do not buffer for me, or implement all possible type
combinations to do this.
(of course there are memory layout subtleties, since you would have to
optimize always for the "fast exit" case, potentially making the worst
case scenario much worse -- unless you do seriously fancy stuff
anyway).

A more general question is actually whether we should rather focus on
solving the same problem more generally.
For example if `numexpr` would implement all/any reductions, it may be
able to pretty simply get the identical tradeoffs with even more
flexibility! (I have to admit, it may get tricky with multiple
reduction dimensions, etc.)

- Sebastian 


> binary "comparison" ufuncs would have attributes
> 
> less.any
> less.all
> less.first  # returns first matching index
> less.count  # counts matches without intermediate bool array
> 
> This adds on to the existing attributes, for instance
> ufuncs already have:
> 
> add.reduce
> add.accumulate
> add.reduceat
> add.outer
> add.at
> 
> It is unfortunate that all ufuncs currently have these attributes
> even 
> if they are unimplemented/inappropriate (eg, np.sin.reduce), I would 
> like to  remove the inappropriate ones, so each core ufunc will only 
> have the appropriate attribute "associated gufuncs".
> 
> Incidentally, once we make reduce/accumuate/... into "associated 
> gufuncs", I propose completely removing the "method" argument of 
> __array_ufunc__, since it is no longer needed and adds a lot
> of complexity which implementors of an __array_ufunc__ are forced to
> account for.
> 
> Cheers,
> Allan
> 
> 
> 
> 
> 
> 
> > 
> > - When it comes to the core ufunc machinery, we have a limited
> > complexity budget. I'm nervous that if we add too many bells and
> > whistles, we'll end up writing ourselves into a corner where we
> > have
> > trouble maintaining it, where it becomes difficult to predict how
> > different features interact, it becomes increasingly difficult for
> > third-parties to handle all the different features in their
> > __array_ufunc__ methods...
> > 
> > - And, we have a lot of other demands on the core ufunc machinery,
> > that might be better places to spend our limited complexity budget.
> > For example, can we come up with an extension to make np.sort a
> > gufunc? That seems like a much higher priority than figuring out
> > how
> > to make all_equal a gufunc. What about refactoring the ufunc
> > machinery
> > to support user-defined dtypes? That'll need some serious work, and
> > again, it's probably higher priority than supporting cross-product
> > or
> > all_equal directly (or at least it seems that way to me).
> > 
> > Maybe there are more compelling use cases that I'm missing, but as
> > it
> > is, I feel like trying to add too many features to the current
> > ufunc
> > machinery is pretty risky for future maintainability, and we
> > shouldn't
> > do it without really solid use cases.
> > 
> > -n
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180531/9243eea2/attachment-0001.sig>


More information about the NumPy-Discussion mailing list