[Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

Marten van Kerkwijk m.h.vankerkwijk at gmail.com
Wed Jun 12 20:55:42 EDT 2019


Hi Ralf,

You're right, the problem is with the added keyword argument (which would
appear also if we did not still have to support the old .sum method
override but just dispatched to __array_ufunc__ with `np.add.reduce` -
maybe worse given that it seems likely the reduce method has seen much less
testing in  __array_ufunc__ implementations).

Still, I do think the question stands: we implement a `nansum` for our
ndarray class a certain way, and provide ways to override it (three now, in
fact). Is it really reasonable to expect that we wait 4 versions for other
packages to keep up with this, and thus get stuck with given internal
implementations?

Aside: note that the present version of the nanfunctions relies on turning
the arguments into arrays and copying 0s into it - that suggests that
currently they do not work for duck arrays like Dask.

All the best,

Marten

On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <stefanv at berkeley.edu>
> wrote:
>
>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>> > In a way, I brought it up mostly as a concrete example of an internal
>> > implementation which we cannot change to an objectively cleaner one
>> because
>> > other packages rely on an out-of-date numpy API.
>>
>
> I think this is not the right way to describe the problem (see below).
>
>
>> This, and the comments Nathaniel made on the array function thread, are
>> important to take note of.  Would it be worth updating NEP 18 with a
>> list of pitfalls?  Or should this be a new informational NEP that
>> discusses—on a higher level—the benefits, risks, and design
>> considerations of providing protocols?
>>
>
> That would be a nice thing to do (the higher level one), but in this case
> I think the issue has little to do with NEP 18. The summary of the issue in
> this thread is a little brief, so let me try to clarify.
>
> 1. np.sum gained a new `where=` keyword in 1.17.0
> 2. using np.sum(x) will detect a `x.sum` method if it's present and try to
> use that
> 3. the `_wrapreduction` utility that forwards the function to the method
> will compare signatures of np.sum and x.sum, and throw an error if there's
> a mismatch for any keywords that have a value other than the default
> np._NoValue
>
> Code to check this:
> >>> x1 = np.arange(5)
> >>> x2 = np.asmatrix(x1)
> >>> np.sum(x1)  # works
> >>> np.sum(x2)  # works
> >>> np.sum(x1, where=x1>3)  # works
> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
> ...
> TypeError: sum() got an unexpected keyword argument 'where'
>
> Note that this is not specific to np.matrix. Using pandas.Series you also
> get a TypeError:
> >>> y = pd.Series(x1)
> >>> np.sum(y)  # works
> >>> np.sum(y, where=y>3)  # pandas throws TypeError
> ...
> TypeError: sum() got an unexpected keyword argument 'where'
>
> The issue is that when we have this kind of forwarding logic, irrespective
> of how it's implemented, new keywords cannot be used until the array-like
> objects with the methods that get forwarded to gain the same keyword.
>
> tl;dr this is simply a cost we have to be aware of when either proposing
> to add new keywords, or when proposing any kind of dispatching logic (in
> this case `_wrapreduction`).
>
> Regarding internal use of  `np.sum(..., where=)`: this should not be done
> until at least 4-5 versions from now, and preferably way longer than that.
> Because doing so will break already released versions of Pandas, Dask, and
> other libraries with array-like objects.
>
> Cheers,
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/c13c81c9/attachment.html>


More information about the NumPy-Discussion mailing list