[Numpy-discussion] Extent to which to work around matrix and other duck/subclass limitations

Ralf Gommers ralf.gommers at gmail.com
Wed Jun 12 16:31:54 EDT 2019


On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <stefanv at berkeley.edu>
wrote:

> On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:
> > In a way, I brought it up mostly as a concrete example of an internal
> > implementation which we cannot change to an objectively cleaner one
> because
> > other packages rely on an out-of-date numpy API.
>

I think this is not the right way to describe the problem (see below).


> This, and the comments Nathaniel made on the array function thread, are
> important to take note of.  Would it be worth updating NEP 18 with a
> list of pitfalls?  Or should this be a new informational NEP that
> discusses—on a higher level—the benefits, risks, and design
> considerations of providing protocols?
>

That would be a nice thing to do (the higher level one), but in this case I
think the issue has little to do with NEP 18. The summary of the issue in
this thread is a little brief, so let me try to clarify.

1. np.sum gained a new `where=` keyword in 1.17.0
2. using np.sum(x) will detect a `x.sum` method if it's present and try to
use that
3. the `_wrapreduction` utility that forwards the function to the method
will compare signatures of np.sum and x.sum, and throw an error if there's
a mismatch for any keywords that have a value other than the default
np._NoValue

Code to check this:
>>> x1 = np.arange(5)
>>> x2 = np.asmatrix(x1)
>>> np.sum(x1)  # works
>>> np.sum(x2)  # works
>>> np.sum(x1, where=x1>3)  # works
>>> np.sum(x2, where=x2>3)  # _wrapreduction throws TypeError
...
TypeError: sum() got an unexpected keyword argument 'where'

Note that this is not specific to np.matrix. Using pandas.Series you also
get a TypeError:
>>> y = pd.Series(x1)
>>> np.sum(y)  # works
>>> np.sum(y, where=y>3)  # pandas throws TypeError
...
TypeError: sum() got an unexpected keyword argument 'where'

The issue is that when we have this kind of forwarding logic, irrespective
of how it's implemented, new keywords cannot be used until the array-like
objects with the methods that get forwarded to gain the same keyword.

tl;dr this is simply a cost we have to be aware of when either proposing to
add new keywords, or when proposing any kind of dispatching logic (in this
case `_wrapreduction`).

Regarding internal use of  `np.sum(..., where=)`: this should not be done
until at least 4-5 versions from now, and preferably way longer than that.
Because doing so will break already released versions of Pandas, Dask, and
other libraries with array-like objects.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190612/d911754a/attachment.html>


More information about the NumPy-Discussion mailing list