[Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

Marten van Kerkwijk m.h.vankerkwijk at gmail.com
Thu Jun 13 08:50:38 EDT 2019


Hi All,

`nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers the
plan to use `asarray` - sorry to have been sloppy with stating they
"convert to array".

And, yes, I'll admit to forgetting that we are introducing `where` only in
1.17 - clearly we cannot rely on other classes to have already implemented
it!! (This is the problem with having done it myself - it seems ages ago!)

For the old implementation, there was indeed only one way to override
(`__array_function__`) for non-subclasses, but for the new implementation
there are three ways, though the first two overlap:
- support `.sum()` with all its arguments and `__array_ufunc__` for `isnan`
- support `__array_ufunc__` including reductions (with all their arguments)
- support `__array_function__`

I think that eventually one would like to move to the case where there is
only one (`__array_ufunc__` in this case, since really it is just a
combination of two ufuncs, isnan and add.reduce in this case).

 Anyway, I guess this is still a good example to consider for how we should
go about getting to a new implementation, ideally with just a single-way to
override?

Indeed, how do we actually envisage deprecating the use of
`__array_function__` for a given part of the numpy API? Are we allowed to
go cold-turkey if the new implementation is covered by `__array_ufunc__`?

All the best,

Marten



On Thu, Jun 13, 2019 at 4:18 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk <
>> m.h.vankerkwijk at gmail.com> wrote:
>>
>>> Hi Ralf,
>>>
>>> You're right, the problem is with the added keyword argument (which
>>> would appear also if we did not still have to support the old .sum method
>>> override but just dispatched to __array_ufunc__ with `np.add.reduce` -
>>> maybe worse given that it seems likely the reduce method has seen much less
>>> testing in  __array_ufunc__ implementations).
>>>
>>> Still, I do think the question stands: we implement a `nansum` for our
>>> ndarray class a certain way, and provide ways to override it (three now, in
>>> fact). Is it really reasonable to expect that we wait 4 versions for other
>>> packages to keep up with this, and thus get stuck with given internal
>>> implementations?
>>>
>>
> In general, I'd say yes. This is a problem of our own making. If an
> implementation uses np.sum on an argument that can be a subclass or duck
> array (i.e. we didn't coerce with asarray), then the code is calling out to
> outside of numpy. At that point we have effectively made something public.
> We can still change it, but only if we're sure that we don't break things
> in the process (i.e. we then insert a new asarray, something you're not
> happy about in general ....).
>
> I don't get the "three ways to override nansum" by the way. There's only
> one I think: __array_function__. There may be more for the sum() method.
>
> Another way to say the same thing: if a subclass does everything right,
> and we decide to add a new keyword to some function in 1.17 without
> considering those subclasses, then turning around straight after and saying
> "oh those subclasses rely on our old API, it may be okay to either break
> them or send them a deprecation warning" (which is I think what you were
> advocating for - your 4 and 3 options) is not reasonable.
>
>
>>> Aside: note that the present version of the nanfunctions relies on
>>> turning the arguments into arrays and copying 0s into it - that suggests
>>> that currently they do not work for duck arrays like Dask.
>>>
>>
> There being an issue with matrix in the PR suggests there is an issue for
> subclasses at least?
>
>
>> Agreed. We could safely rewrite things to use np.asarray(), without any
>> need to worry about backends compatibility. From an API perspective,
>> nothing would change -- we already cast inputs into base numpy arrays
>> inside the _replace_nan() routine.
>>
>
> In that case yes, nansum can be rewritten. However, it's only because of
> the use of (as)array - if it were asanyarray that would already scupper the
> plan.
>
> Cheers,
> Ralf
>
>
>>
>>>
>>> All the best,
>>>
>>> Marten
>>>
>>> On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <
>>>> stefanv at berkeley.edu> wrote:
>>>>
>>>>> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
>>>>> > In a way, I brought it up mostly as a concrete example of an internal
>>>>> > implementation which we cannot change to an objectively cleaner one
>>>>> because
>>>>> > other packages rely on an out-of-date numpy API.
>>>>>
>>>>
>>>> I think this is not the right way to describe the problem (see below).
>>>>
>>>>
>>>>> This, and the comments Nathaniel made on the array function thread, are
>>>>> important to take note of.  Would it be worth updating NEP 18 with a
>>>>> list of pitfalls?  Or should this be a new informational NEP that
>>>>> discusses—on a higher level—the benefits, risks, and design
>>>>> considerations of providing protocols?
>>>>>
>>>>
>>>> That would be a nice thing to do (the higher level one), but in this
>>>> case I think the issue has little to do with NEP 18. The summary of the
>>>> issue in this thread is a little brief, so let me try to clarify.
>>>>
>>>> 1. np.sum gained a new `where=` keyword in 1.17.0
>>>> 2. using np.sum(x) will detect a `x.sum` method if it's present and try
>>>> to use that
>>>> 3. the `_wrapreduction` utility that forwards the function to the
>>>> method will compare signatures of np.sum and x.sum, and throw an error if
>>>> there's a mismatch for any keywords that have a value other than the
>>>> default np._NoValue
>>>>
>>>> Code to check this:
>>>> >>> x1 = np.arange(5)
>>>> >>> x2 = np.asmatrix(x1)
>>>> >>> np.sum(x1)  # works
>>>> >>> np.sum(x2)  # works
>>>> >>> np.sum(x1, where=x1>3)  # works
>>>> >>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
>>>> ...
>>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>>
>>>> Note that this is not specific to np.matrix. Using pandas.Series you
>>>> also get a TypeError:
>>>> >>> y = pd.Series(x1)
>>>> >>> np.sum(y)  # works
>>>> >>> np.sum(y, where=y>3)  # pandas throws TypeError
>>>> ...
>>>> TypeError: sum() got an unexpected keyword argument 'where'
>>>>
>>>> The issue is that when we have this kind of forwarding logic,
>>>> irrespective of how it's implemented, new keywords cannot be used until the
>>>> array-like objects with the methods that get forwarded to gain the same
>>>> keyword.
>>>>
>>>> tl;dr this is simply a cost we have to be aware of when either
>>>> proposing to add new keywords, or when proposing any kind of dispatching
>>>> logic (in this case `_wrapreduction`).
>>>>
>>>> Regarding internal use of  `np.sum(..., where=)`: this should not be
>>>> done until at least 4-5 versions from now, and preferably way longer than
>>>> that. Because doing so will break already released versions of Pandas,
>>>> Dask, and other libraries with array-like objects.
>>>>
>>>> Cheers,
>>>> Ralf
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at python.org
>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190613/e31edc12/attachment-0001.html>


More information about the NumPy-Discussion mailing list