[Numpy-discussion] @ operator

Thu Sep 11 21:47:15 EDT 2014

On Thu, Sep 11, 2014 at 10:10 AM, Charles R Harris <
charlesr.harris at gmail.com> wrote:

>
>
> On Wed, Sep 10, 2014 at 10:08 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Wed, Sep 10, 2014 at 4:53 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> > On Wed, Sep 10, 2014 at 10:52 AM, Pauli Virtanen <pav at iki.fi> wrote:
>> >>
>> >> 09.09.2014, 22:52, Charles R Harris kirjoitti:
>> >> >   1. Should the operator accept array_like for one of the arguments?
>> >> >   2. Does it need to handle __numpy_ufunc__, or will
>> >> >   __array_priority__ serve?
>> >>
>> >> I think the __matmul__ operator implementation should follow that of
>> >> __mul__.
>> >>
>> >> [clip]
>> >> >    3. Do we want PyArray_Matmul in the numpy API?
>> >> >    4. Should a matmul function be supplied by the multiarray module?
>> >> >
>> >> > If 3 and 4 are wanted, should they use the __numpy_ufunc__
>> machinery, or
>> >> > will __array_priority__ serve?
>> >>
>> >> dot() function deals with __numpy_ufunc__, and the matmul() function
>> >> should behave similarly.
>> >>
>> >> It seems dot() uses __array_priority__ for selection of output return
>> >> subclass, so matmul() probably needs do the same thing.
>> >>
>> >> > Note that the type number operators, __add__ and such, currently use
>> >> > __numpy_ufunc__ in combination with __array_priority__, this in
>> addition
>> >> > to
>> >> > the fact that they are by default using ufuncs that do the same. I'd
>> >> > rather
>> >> > that the __*__ operators simply rely on __array_priority__.
>> >>
>> >> The whole business of __array_priority__ and __numpy_ufunc__ in the
>> >> binary ops is solely about when __<op>__ should yield the execution to
>> >> __r<op>__ of the other object.
>> >>
>> >> The rule of operation currently is: "__rmul__ before __numpy_ufunc__"
>> >>
>> >> If you remove the __numpy_ufunc__ handling, it becomes: "__rmul__
>> before
>> >> __numpy_ufunc__, except if array_priority happens to be smaller than
>> >> that of the other class and your class is not an ndarray subclass".
>> >>
>> >> The following binops also do not IIRC respect __array_priority__ in
>> >> preferring right-hand operand:
>> >>
>> >> - in-place operations
>> >> - comparisons
>> >>
>> >> One question here is whether it's possible to change the behavior of
>> >> __array_priority__ here at all, or whether changes are possible only in
>> >> the context of adding new attributes telling Numpy what to do.
>> >
>> > I was tempted to make it a generalized ufunc, which would take care of
>> a lot
>> > of things, but there is a lot of overhead in those functions. Sounds
>> like
>> > the easiest thing is to make it similar to dot, although having an
>> inplace
>> > versions complicates the type selection a bit.
>>
>> Can we please fix the overhead instead of adding more half-complete
>> implementations of the same concepts? I feel like this usually ends up
>> slowing things down in the end, as optimization efforts get divided...
>>
>> My vote is:
>>
>> __matmul__/__rmatmul__ do the standard dispatch stuff that all __op__
>> methods do (so I guess check __array_priority__ or whatever it is we
>> always do). I'd also be okay with ignoring __array_priority__ on the
>> grounds that __numpy_ufunc__ is better, and there's no existing code
>> relying on __array_priority__ support in __matmul__.
>>
>> Having decided that we are actually going to run, they dispatch
>> unconditionally to np.newdot(a, b) (or newdot(a, b, out=a) for the
>> in-place version), similarly to how e.g. __add__ dispatches to np.add.
>>
>> newdot acts like a standard gufunc with all the standard niceties,
>> including __numpy_ufunc__ dispatch.
>>
>> ("newdot" here is intended as a placeholder name, maybe it should be
>> np.linalg.matmul or something else to be bikeshed later. I also vote
>> that eventually 'dot' become an alias for this function, but whether
>> to do that is an orthogonal discussion for later.)
>>
>> If we went the ufunc route, I think we would want three of them, matxvec,
> vecxmat, and matxmat, because the best inner loops would be different in
> the three cases, but they couldn't be straight ufuncs themselves, as we
> don't need the other options, `reduce`, etc., but they can't be exactly
> like the linalg machinery, because we do want subclasses to be able to
> override. Hmm...
>
> The ufunc machinery has some funky aspects. For instance, there are
> hardwired checks for `__radd__` and other such operators in
> PyUFunc_GenericFunction  that allows subclasses to overide the ufunc. Those
> options should really be part of the PyUFuncObject.
>

Here are some timing results for the inner product of an integer array
comparing inner, ufunc (inner1d), and einsum implementations. The np.long
type was chosen because it is implemented for inner1d and to avoid the
effect of using cblas.

In [43]: a = ones(100000, dtype=np.long)

In [44]: timeit inner(a,a)
10000 loops, best of 3: 55.5 µs per loop

In [45]: timeit inner1d(a,a)
10000 loops, best of 3: 56.2 µs per loop

In [46]: timeit einsum('...i,...i',a,a)
10000 loops, best of 3: 43.8 µs per loop

In [47]: a = ones(1, dtype=np.long)

In [48]: timeit inner(a,a)
1000000 loops, best of 3: 484 ns per loop

In [49]: timeit inner1d(a,a)
1000000 loops, best of 3: 741 ns per loop

In [50]: timeit einsum('...i,...i',a,a)
1000000 loops, best of 3: 811 ns per loop

For big arrays, einsum has a better inner loop. For small arrays, einsum
and inner1d suffer from call overhead. Note that the einsum overhead could
be improved by special casing, and that the various loops could be
consolidated into best of breed.

The ufunc implementation is easy to do for all cases.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140911/c67bef59/attachment.html>