[Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

Ralf Gommers ralf.gommers at gmail.com
Thu Jun 13 11:15:09 EDT 2019


On Thu, Jun 13, 2019 at 2:51 PM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> Hi All,
>
> `nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers
> the plan to use `asarray` - sorry to have been sloppy with stating they
> "convert to array".
>
> And, yes, I'll admit to forgetting that we are introducing `where` only in
> 1.17 - clearly we cannot rely on other classes to have already implemented
> it!! (This is the problem with having done it myself - it seems ages ago!)
>
> For the old implementation, there was indeed only one way to override
> (`__array_function__`) for non-subclasses, but for the new implementation
> there are three ways, though the first two overlap:
> - support `.sum()` with all its arguments and `__array_ufunc__` for `isnan`
> - support `__array_ufunc__` including reductions (with all their arguments)
> - support `__array_function__`
>
> I think that eventually one would like to move to the case where there is
> only one (`__array_ufunc__` in this case, since really it is just a
> combination of two ufuncs, isnan and add.reduce in this case).
>

I don't think the other ones are really ways of "overriding nansum". They
happen to work, but rely on internal implementation details and then
overriding functions inside those details.

It sounds like you're suggestion a new way of declaring "this is a
canonical implementation that duck arrays can rely on". I can see that
happening, but we probably want some standard way to mark those functions,
have a note in the docs and some tests.


>  Anyway, I guess this is still a good example to consider for how we
> should go about getting to a new implementation, ideally with just a
> single-way to override?
>
> Indeed, how do we actually envisage deprecating the use of
> `__array_function__` for a given part of the numpy API? Are we allowed to
> go cold-turkey if the new implementation is covered by `__array_ufunc__`?
>

I think __array_function__ is still the best way to do this (that's the
only actual override, so most robust and performant likely), so I don't see
any reason for a deprecation.

Cheers,
Ralf



> All the best,
>
> Marten
>
>
>
> On Thu, Jun 13, 2019 at 4:18 AM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>>
>>> On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
>>> m.h.vankerkwijk at gmail.com> wrote:
>>>
>>>> Hi Ralf,
>>>>
>>>> You're right, the problem is with the added keyword argument (which
>>>> would appear also if we did not still have to support the old .sum method
>>>> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
>>>> maybe worse given that it seems likely the reduce method has seen much less
>>>> testing in  __array_ufunc__ implementations).
>>>>
>>>> Still, I do think the question stands: we implement a `nansum` for our
>>>> ndarray class a certain way, and provide ways to override it (three now, in
>>>> fact). Is it really reasonable to expect that we wait 4 versions for other
>>>> packages to keep up with this, and thus get stuck with given internal
>>>> implementations?
>>>>
>>>
>> In general, I'd say yes. This is a problem of our own making. If an
>> implementation uses np.sum on an argument that can be a subclass or duck
>> array (i.e. we didn't coerce with asarray), then the code is calling out to
>> outside of numpy. At that point we have effectively made something public.
>> We can still change it, but only if we're sure that we don't break things
>> in the process (i.e. we then insert a new asarray, something you're not
>> happy about in general ....).
>>
>> I don't get the "three ways to override nansum" by the way. There's only
>> one I think: __array_function__. There may be more for the sum() method.
>>
>> Another way to say the same thing: if a subclass does everything right,
>> and we decide to add a new keyword to some function in 1.17 without
>> considering those subclasses, then turning around straight after and saying
>> "oh those subclasses rely on our old API, it may be okay to either break
>> them or send them a deprecation warning" (which is I think what you were
>> advocating for - your 4 and 3 options) is not reasonable.
>>
>>
>>>> Aside: note that the present version of the nanfunctions relies on
>>>> turning the arguments into arrays and copying 0s into it - that suggests
>>>> that currently they do not work for duck arrays like Dask.
>>>>
>>>
>> There being an issue with matrix in the PR suggests there is an issue for
>> subclasses at least?
>>
>>
>>> Agreed. We could safely rewrite things to use np.asarray(), without any
>>> need to worry about backends compatibility. From an API perspective,
>>> nothing would change -- we already cast inputs into base numpy arrays
>>> inside the _replace_nan() routine.
>>>
>>
>> In that case yes, nansum can be rewritten. However, it's only because of
>> the use of (as)array - if it were asanyarray that would already scupper the
>> plan.
>>
>> Cheers,
>> Ralf
>>
>>
>>>
>>>>
>>>> All the best,
>>>>
>>>> Marten
>>>>
>>>> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
>>>>> stefanv at berkeley.edu> wrote:
>>>>>
>>>>>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>>>>>> > In a way, I brought it up mostly as a concrete example of an
>>>>>> internal
>>>>>> > implementation which we cannot change to an objectively cleaner one
>>>>>> because
>>>>>> > other packages rely on an out-of-date numpy API.
>>>>>>
>>>>>
>>>>> I think this is not the right way to describe the problem (see below).
>>>>>
>>>>>
>>>>>> This, and the comments Nathaniel made on the array function thread,
>>>>>> are
>>>>>> important to take note of.  Would it be worth updating NEP 18 with a
>>>>>> list of pitfalls?  Or should this be a new informational NEP that
>>>>>> discusses—on a higher level—the benefits, risks, and design
>>>>>> considerations of providing protocols?
>>>>>>
>>>>>
>>>>> That would be a nice thing to do (the higher level one), but in this
>>>>> case I think the issue has little to do with NEP 18. The summary of the
>>>>> issue in this thread is a little brief, so let me try to clarify.
>>>>>
>>>>> 1. np.sum gained a new `where=` keyword in 1.17.0
>>>>> 2. using np.sum(x) will detect a `x.sum` method if it's present and
>>>>> try to use that
>>>>> 3. the `_wrapreduction` utility that forwards the function to the
>>>>> method will compare signatures of np.sum and x.sum, and throw an error if
>>>>> there's a mismatch for any keywords that have a value other than the
>>>>> default np._NoValue
>>>>>
>>>>> Code to check this:
>>>>> >>> x1 = np.arange(5)
>>>>> >>> x2 = np.asmatrix(x1)
>>>>> >>> np.sum(x1)  # works
>>>>> >>> np.sum(x2)  # works
>>>>> >>> np.sum(x1, where=x1>3)  # works
>>>>> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
>>>>> ...
>>>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>>>
>>>>> Note that this is not specific to np.matrix. Using pandas.Series you
>>>>> also get a TypeError:
>>>>> >>> y = pd.Series(x1)
>>>>> >>> np.sum(y)  # works
>>>>> >>> np.sum(y, where=y>3)  # pandas throws TypeError
>>>>> ...
>>>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>>>
>>>>> The issue is that when we have this kind of forwarding logic,
>>>>> irrespective of how it's implemented, new keywords cannot be used until the
>>>>> array-like objects with the methods that get forwarded to gain the same
>>>>> keyword.
>>>>>
>>>>> tl;dr this is simply a cost we have to be aware of when either
>>>>> proposing to add new keywords, or when proposing any kind of dispatching
>>>>> logic (in this case `_wrapreduction`).
>>>>>
>>>>> Regarding internal use of  `np.sum(..., where=)`: this should not be
>>>>> done until at least 4-5 versions from now, and preferably way longer than
>>>>> that. Because doing so will break already released versions of Pandas,
>>>>> Dask, and other libraries with array-like objects.
>>>>>
>>>>> Cheers,
>>>>> Ralf
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> NumPy-Discussion mailing list
>>>>> NumPy-Discussion at python.org
>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/fe853d72/attachment-0001.html>


More information about the NumPy-Discussion mailing list