[Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

Ralf Gommers ralf.gommers at gmail.com
Thu Jun 13 04:17:31 EDT 2019


On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> Hi Ralf,
>>
>> You're right, the problem is with the added keyword argument (which would
>> appear also if we did not still have to support the old .sum method
>> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
>> maybe worse given that it seems likely the reduce method has seen much less
>> testing in  __array_ufunc__ implementations).
>>
>> Still, I do think the question stands: we implement a `nansum` for our
>> ndarray class a certain way, and provide ways to override it (three now, in
>> fact). Is it really reasonable to expect that we wait 4 versions for other
>> packages to keep up with this, and thus get stuck with given internal
>> implementations?
>>
>
In general, I'd say yes. This is a problem of our own making. If an
implementation uses np.sum on an argument that can be a subclass or duck
array (i.e. we didn't coerce with asarray), then the code is calling out to
outside of numpy. At that point we have effectively made something public.
We can still change it, but only if we're sure that we don't break things
in the process (i.e. we then insert a new asarray, something you're not
happy about in general ....).

I don't get the "three ways to override nansum" by the way. There's only
one I think: __array_function__. There may be more for the sum() method.

Another way to say the same thing: if a subclass does everything right, and
we decide to add a new keyword to some function in 1.17 without considering
those subclasses, then turning around straight after and saying "oh those
subclasses rely on our old API, it may be okay to either break them or send
them a deprecation warning" (which is I think what you were advocating for
- your 4 and 3 options) is not reasonable.


>> Aside: note that the present version of the nanfunctions relies on
>> turning the arguments into arrays and copying 0s into it - that suggests
>> that currently they do not work for duck arrays like Dask.
>>
>
There being an issue with matrix in the PR suggests there is an issue for
subclasses at least?


> Agreed. We could safely rewrite things to use np.asarray(), without any
> need to worry about backends compatibility. From an API perspective,
> nothing would change -- we already cast inputs into base numpy arrays
> inside the _replace_nan() routine.
>

In that case yes, nansum can be rewritten. However, it's only because of
the use of (as)array - if it were asanyarray that would already scupper the
plan.

Cheers,
Ralf


>
>>
>> All the best,
>>
>> Marten
>>
>> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
>>> stefanv at berkeley.edu> wrote:
>>>
>>>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>>>> > In a way, I brought it up mostly as a concrete example of an internal
>>>> > implementation which we cannot change to an objectively cleaner one
>>>> because
>>>> > other packages rely on an out-of-date numpy API.
>>>>
>>>
>>> I think this is not the right way to describe the problem (see below).
>>>
>>>
>>>> This, and the comments Nathaniel made on the array function thread, are
>>>> important to take note of.  Would it be worth updating NEP 18 with a
>>>> list of pitfalls?  Or should this be a new informational NEP that
>>>> discusses—on a higher level—the benefits, risks, and design
>>>> considerations of providing protocols?
>>>>
>>>
>>> That would be a nice thing to do (the higher level one), but in this
>>> case I think the issue has little to do with NEP 18. The summary of the
>>> issue in this thread is a little brief, so let me try to clarify.
>>>
>>> 1. np.sum gained a new `where=` keyword in 1.17.0
>>> 2. using np.sum(x) will detect a `x.sum` method if it's present and try
>>> to use that
>>> 3. the `_wrapreduction` utility that forwards the function to the method
>>> will compare signatures of np.sum and x.sum, and throw an error if there's
>>> a mismatch for any keywords that have a value other than the default
>>> np._NoValue
>>>
>>> Code to check this:
>>> >>> x1 = np.arange(5)
>>> >>> x2 = np.asmatrix(x1)
>>> >>> np.sum(x1)  # works
>>> >>> np.sum(x2)  # works
>>> >>> np.sum(x1, where=x1>3)  # works
>>> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
>>> ...
>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>
>>> Note that this is not specific to np.matrix. Using pandas.Series you
>>> also get a TypeError:
>>> >>> y = pd.Series(x1)
>>> >>> np.sum(y)  # works
>>> >>> np.sum(y, where=y>3)  # pandas throws TypeError
>>> ...
>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>
>>> The issue is that when we have this kind of forwarding logic,
>>> irrespective of how it's implemented, new keywords cannot be used until the
>>> array-like objects with the methods that get forwarded to gain the same
>>> keyword.
>>>
>>> tl;dr this is simply a cost we have to be aware of when either proposing
>>> to add new keywords, or when proposing any kind of dispatching logic (in
>>> this case `_wrapreduction`).
>>>
>>> Regarding internal use of  `np.sum(..., where=)`: this should not be
>>> done until at least 4-5 versions from now, and preferably way longer than
>>> that. Because doing so will break already released versions of Pandas,
>>> Dask, and other libraries with array-like objects.
>>>
>>> Cheers,
>>> Ralf
>>>
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/498e0de0/attachment.html>


More information about the NumPy-Discussion mailing list