Mailman 3 Extent to which to work around matrix and other duck/subclass limitations - NumPy-Discussion

newer
(Value Based Promotion) Current...

Extent to which to work around matrix and other duck/subclass limitations

Marten van Kerkwijk

10 Jun 2019 10 Jun '19

5:47 p.m.

Hi All, In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to use the new `where` argument for reductions to implement `nansum`, etc., using simplifications that boil down to `np.sum(..., where=~isnan(...))`. A problem that occurs is that `np.sum` will use a `.sum` method if that is present, and for matrix, the `.sum` method does not take a `where` argument. Since the `where` argument has been introduced only recently, similar problems may happen for array mimics that implement their own `.sum` method. The question now is what to do, with options being: 1. Let's stick with the existing implementation; the speed-up is not that great anyway. 2. Use try/except around the new implementation and use the old one if it fails. 3. As (2), but emit a deprecation warning. This will help array mimics, but not matrix (unless someone makes a PR; would we even accept it?); 4. Use the new implementation. `matrix` should be gone anyway and array mimics can either update their `.sum()` method or override `np.nansum` with `__array_function__`. Personally, I'd prefer (4), but realize that (3) is probably the more safer approach, even if it is really annoying to still be taking into account matrix deficiencies. All the best, Marten p.s. One could also avoid the `.sum()` override altogether by doing `np.add.reduce(..., where=...)`, but this would probably break just as much.

Attachments:

attachment.htm (text/html — 1.7 KB)

Show replies by date

Hameer Abbasi

10 Jun 10 Jun

7:50 p.m.

On Mon, 2019-06-10 at 13:47 -0400, Marten van Kerkwijk wrote:

...

Hi All,

In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to use the new `where` argument for reductions to implement `nansum`, etc., using simplifications that boil down to `np.sum(..., where=~isnan(...))`.

A problem that occurs is that `np.sum` will use a `.sum` method if that is present, and for matrix, the `.sum` method does not take a `where` argument. Since the `where` argument has been introduced only recently, similar problems may happen for array mimics that implement their own `.sum` method.

Hi Marten! I ran into a similar issue with the initial kwarg when I implemented it, except at that time I just used the np._NoValue.

...

The question now is what to do, with options being: 1. Let's stick with the existing implementation; the speed-up is not that great anyway. 2. Use try/except around the new implementation and use the old one if it fails. 3. As (2), but emit a deprecation warning. This will help array mimics, but not matrix (unless someone makes a PR; would we even accept it?); 4. Use the new implementation. `matrix` should be gone anyway and array mimics can either update their `.sum()` method or override `np.nansum` with `__array_function__`.

Personally, I'd prefer (4), but realize that (3) is probably the more safer approach, even if it is really annoying to still be taking into account matrix deficiencies.

If nansum does any other kind of dispatch that should be kept around in any case. Otherwise it failed then and it would fail now. We can catch and raise the right type of exception for backwards compatibility if needed.

...

All the best,

Marten

p.s. One could also avoid the `.sum()` override altogether by doing `np.add.reduce(..., where=...)`, but this would probably break just as much.

_______________________________________________NumPy-Discussion mailing listNumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Ralf Gommers

11 Jun 11 Jun

8:56 a.m.

On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

Hi All,

In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to use the new `where` argument for reductions to implement `nansum`, etc., using simplifications that boil down to `np.sum(..., where=~isnan(...))`.

A problem that occurs is that `np.sum` will use a `.sum` method if that is present, and for matrix, the `.sum` method does not take a `where` argument. Since the `where` argument has been introduced only recently, similar problems may happen for array mimics that implement their own `.sum` method.

The question now is what to do, with options being: 1. Let's stick with the existing implementation; the speed-up is not that great anyway. 2. Use try/except around the new implementation and use the old one if it fails. 3. As (2), but emit a deprecation warning. This will help array mimics, but not matrix (unless someone makes a PR; would we even accept it?); 4. Use the new implementation. `matrix` should be gone anyway and array mimics can either update their `.sum()` method or override `np.nansum` with `__array_function__`.

Personally, I'd prefer (4), but realize that (3) is probably the more safer approach, even if it is really annoying to still be taking into account matrix deficiencies.

Honestly, I agree with Tyler's assessment when he closed the PR: "there's not much evidence of remarkable performance improvement, and many of the other nan functions would have more complicated implementation details that are unlikely to make the relative performance much better. It doesn't seem to be a priority either." The code also got more rather than less complex. So why do this? Yes you give a reason why it may help duck arrays, but it also breaks things apparently. Re matrix: it's not deprecated yet and is still fairly widely used, so yes we'd accept a PR and no we should not break it gratuitously. Cheers, Ralf

Sebastian Berg

3:52 p.m.

On Tue, 2019-06-11 at 10:56 +0200, Ralf Gommers wrote:

...

On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...
Hi All,

In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to use the new `where` argument for reductions to implement `nansum`, etc., using simplifications that boil down to `np.sum(..., where=~isnan(...))`.

A problem that occurs is that `np.sum` will use a `.sum` method if that is present, and for matrix, the `.sum` method does not take a `where` argument. Since the `where` argument has been introduced only recently, similar problems may happen for array mimics that implement their own `.sum` method.

The question now is what to do, with options being: 1. Let's stick with the existing implementation; the speed-up is not that great anyway. 2. Use try/except around the new implementation and use the old one if it fails. 3. As (2), but emit a deprecation warning. This will help array mimics, but not matrix (unless someone makes a PR; would we even accept it?); 4. Use the new implementation. `matrix` should be gone anyway and array mimics can either update their `.sum()` method or override `np.nansum` with `__array_function__`.

Personally, I'd prefer (4), but realize that (3) is probably the more safer approach, even if it is really annoying to still be taking into account matrix deficiencies.

Honestly, I agree with Tyler's assessment when he closed the PR: "there's not much evidence of remarkable performance improvement, and many of the other nan functions would have more complicated implementation details that are unlikely to make the relative performance much better. It doesn't seem to be a priority either."

The code also got more rather than less complex. So why do this? Yes you give a reason why it may help duck arrays, but it also breaks things apparently.

There could probably be reasons for this, since replacing with 0 may not work always for object arrays. But that is rather hypothetical (e.g. using add as string concatenation). Calling `np.add.reduce` instead of the sum method could be slightly more compatible, but could have its own set of issue. But I suppose it is true, unless we plan on adding specialized `where` loops or a nanadd ufunc itself the gain is likely so small that it it arguably may make more sense to defer this until there is an an actual benefit (and less pain due to adoption of the new protocols). Best, Sebastian

...

Re matrix: it's not deprecated yet and is still fairly widely used, so yes we'd accept a PR and no we should not break it gratuitously.

Cheers, Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

7:10 p.m.

OK, fair enough to just drop this! In this particular case, I do not care that much about duck-typing, just about making the implementation cleaner (which it is if one doesn't have to worry about diversion via a `.sum()` method). In a way, I brought it up mostly as a concrete example of an internal implementation which we cannot change to an objectively cleaner one because other packages rely on an out-of-date numpy API. -- Marten On Tue, Jun 11, 2019 at 11:53 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:

...

On Tue, 2019-06-11 at 10:56 +0200, Ralf Gommers wrote:

...
On Mon, Jun 10, 2019 at 7:47 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...
Hi All,

In https://github.com/numpy/numpy/pull/12801, Tyler has been trying to use the new `where` argument for reductions to implement `nansum`, etc., using simplifications that boil down to `np.sum(..., where=~isnan(...))`.

A problem that occurs is that `np.sum` will use a `.sum` method if that is present, and for matrix, the `.sum` method does not take a `where` argument. Since the `where` argument has been introduced only recently, similar problems may happen for array mimics that implement their own `.sum` method.

The question now is what to do, with options being: 1. Let's stick with the existing implementation; the speed-up is not that great anyway. 2. Use try/except around the new implementation and use the old one if it fails. 3. As (2), but emit a deprecation warning. This will help array mimics, but not matrix (unless someone makes a PR; would we even accept it?); 4. Use the new implementation. `matrix` should be gone anyway and array mimics can either update their `.sum()` method or override `np.nansum` with `__array_function__`.

Personally, I'd prefer (4), but realize that (3) is probably the more safer approach, even if it is really annoying to still be taking into account matrix deficiencies.

Honestly, I agree with Tyler's assessment when he closed the PR: "there's not much evidence of remarkable performance improvement, and many of the other nan functions would have more complicated implementation details that are unlikely to make the relative performance much better. It doesn't seem to be a priority either."

The code also got more rather than less complex. So why do this? Yes you give a reason why it may help duck arrays, but it also breaks things apparently.

There could probably be reasons for this, since replacing with 0 may not work always for object arrays. But that is rather hypothetical (e.g. using add as string concatenation).

Calling `np.add.reduce` instead of the sum method could be slightly more compatible, but could have its own set of issue. But I suppose it is true, unless we plan on adding specialized `where` loops or a nanadd ufunc itself the gain is likely so small that it it arguably may make more sense to defer this until there is an an actual benefit (and less pain due to adoption of the new protocols).

Best,

Sebastian

...
Re matrix: it's not deprecated yet and is still fairly widely used, so yes we'd accept a PR and no we should not break it gratuitously.

Cheers, Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

7:13 p.m.

...

In a way, I brought it up mostly as a concrete example of an internal implementation which we cannot change to an objectively cleaner one because other packages rely on an out-of-date numpy API.

Should have added: rely on an out-of-date numpy API where we have multiple ways for packages to provide their own overrides. But I guess in this case it is really matrix which is the problem. If so, maybe just adding kwargs to it is something we should consider.

-- Marten

Stefan van der Walt

10:01 p.m.

On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:

...

In a way, I brought it up mostly as a concrete example of an internal implementation which we cannot change to an objectively cleaner one because other packages rely on an out-of-date numpy API.

This, and the comments Nathaniel made on the array function thread, are important to take note of. Would it be worth updating NEP 18 with a list of pitfalls? Or should this be a new informational NEP that discusses—on a higher level—the benefits, risks, and design considerations of providing protocols? Stéfan

Ralf Gommers

12 Jun 12 Jun

8:31 p.m.

On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:

...

On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:

...
In a way, I brought it up mostly as a concrete example of an internal implementation which we cannot change to an objectively cleaner one because other packages rely on an out-of-date numpy API.

I think this is not the right way to describe the problem (see below).

...

This, and the comments Nathaniel made on the array function thread, are important to take note of. Would it be worth updating NEP 18 with a list of pitfalls? Or should this be a new informational NEP that discusses—on a higher level—the benefits, risks, and design considerations of providing protocols?

That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify. 1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue Code to check this:

...

...
...
x1 = np.arange(5) x2 = np.asmatrix(x1) np.sum(x1) # works np.sum(x2) # works np.sum(x1, where=x1>3) # works np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError:

...

...
...
y = pd.Series(x1) np.sum(y) # works np.sum(y, where=y>3) # pandas throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the array-like objects with the methods that get forwarded to gain the same keyword. tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`). Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 4-5 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with array-like objects. Cheers, Ralf

Marten van Kerkwijk

13 Jun 13 Jun

12:55 a.m.

Hi Ralf, You're right, the problem is with the added keyword argument (which would appear also if we did not still have to support the old .sum method override but just dispatched to __array_ufunc__ with `np.add.reduce` - maybe worse given that it seems likely the reduce method has seen much less testing in __array_ufunc__ implementations). Still, I do think the question stands: we implement a `nansum` for our ndarray class a certain way, and provide ways to override it (three now, in fact). Is it really reasonable to expect that we wait 4 versions for other packages to keep up with this, and thus get stuck with given internal implementations? Aside: note that the present version of the nanfunctions relies on turning the arguments into arrays and copying 0s into it - that suggests that currently they do not work for duck arrays like Dask. All the best, Marten On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:

...

On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:

...
On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:

...
In a way, I brought it up mostly as a concrete example of an internal implementation which we cannot change to an objectively cleaner one because other packages rely on an out-of-date numpy API.

I think this is not the right way to describe the problem (see below).

...
This, and the comments Nathaniel made on the array function thread, are important to take note of. Would it be worth updating NEP 18 with a list of pitfalls? Or should this be a new informational NEP that discusses—on a higher level—the benefits, risks, and design considerations of providing protocols?

That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify.

1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue

Code to check this:

...
...
...
x1 = np.arange(5) x2 = np.asmatrix(x1) np.sum(x1) # works np.sum(x2) # works np.sum(x1, where=x1>3) # works np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError:

...
...
...
y = pd.Series(x1) np.sum(y) # works np.sum(y, where=y>3) # pandas throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the array-like objects with the methods that get forwarded to gain the same keyword.

tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`).

Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 4-5 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with array-like objects.

Cheers, Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Stephan Hoyer

1:16 a.m.

On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

Hi Ralf,

You're right, the problem is with the added keyword argument (which would appear also if we did not still have to support the old .sum method override but just dispatched to __array_ufunc__ with `np.add.reduce` - maybe worse given that it seems likely the reduce method has seen much less testing in __array_ufunc__ implementations).

Still, I do think the question stands: we implement a `nansum` for our ndarray class a certain way, and provide ways to override it (three now, in fact). Is it really reasonable to expect that we wait 4 versions for other packages to keep up with this, and thus get stuck with given internal implementations?

Aside: note that the present version of the nanfunctions relies on turning the arguments into arrays and copying 0s into it - that suggests that currently they do not work for duck arrays like Dask.

Agreed. We could safely rewrite things to use np.asarray(), without any need to worry about backends compatibility. From an API perspective, nothing would change -- we already cast inputs into base numpy arrays inside the _replace_nan() routine.

...

All the best,

Marten

On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:

...
On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt < stefanv@berkeley.edu> wrote:

...
On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:

...
In a way, I brought it up mostly as a concrete example of an internal implementation which we cannot change to an objectively cleaner one because other packages rely on an out-of-date numpy API.

I think this is not the right way to describe the problem (see below).

...
This, and the comments Nathaniel made on the array function thread, are important to take note of. Would it be worth updating NEP 18 with a list of pitfalls? Or should this be a new informational NEP that discusses—on a higher level—the benefits, risks, and design considerations of providing protocols?

That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify.

1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue

Code to check this:

...
...
...
x1 = np.arange(5) x2 = np.asmatrix(x1) np.sum(x1) # works np.sum(x2) # works np.sum(x1, where=x1>3) # works np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError:

...
...
...
y = pd.Series(x1) np.sum(y) # works np.sum(y, where=y>3) # pandas throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the array-like objects with the methods that get forwarded to gain the same keyword.

tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`).

Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 4-5 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with array-like objects.

Cheers, Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Ralf Gommers

8:17 a.m.

On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer <shoyer@gmail.com> wrote:

...

On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...
Hi Ralf,

You're right, the problem is with the added keyword argument (which would appear also if we did not still have to support the old .sum method override but just dispatched to __array_ufunc__ with `np.add.reduce` - maybe worse given that it seems likely the reduce method has seen much less testing in __array_ufunc__ implementations).

Still, I do think the question stands: we implement a `nansum` for our ndarray class a certain way, and provide ways to override it (three now, in fact). Is it really reasonable to expect that we wait 4 versions for other packages to keep up with this, and thus get stuck with given internal implementations?

In general, I'd say yes. This is a problem of our own making. If an implementation uses np.sum on an argument that can be a subclass or duck array (i.e. we didn't coerce with asarray), then the code is calling out to outside of numpy. At that point we have effectively made something public. We can still change it, but only if we're sure that we don't break things in the process (i.e. we then insert a new asarray, something you're not happy about in general ....). I don't get the "three ways to override nansum" by the way. There's only one I think: __array_function__. There may be more for the sum() method. Another way to say the same thing: if a subclass does everything right, and we decide to add a new keyword to some function in 1.17 without considering those subclasses, then turning around straight after and saying "oh those subclasses rely on our old API, it may be okay to either break them or send them a deprecation warning" (which is I think what you were advocating for - your 4 and 3 options) is not reasonable.

...

...
Aside: note that the present version of the nanfunctions relies on turning the arguments into arrays and copying 0s into it - that suggests that currently they do not work for duck arrays like Dask.

There being an issue with matrix in the PR suggests there is an issue for subclasses at least?

...

Agreed. We could safely rewrite things to use np.asarray(), without any need to worry about backends compatibility. From an API perspective, nothing would change -- we already cast inputs into base numpy arrays inside the _replace_nan() routine.

In that case yes, nansum can be rewritten. However, it's only because of the use of (as)array - if it were asanyarray that would already scupper the plan. Cheers, Ralf

...

...
All the best,

Marten

On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:

...
On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt < stefanv@berkeley.edu> wrote:

...
On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:

...
In a way, I brought it up mostly as a concrete example of an internal implementation which we cannot change to an objectively cleaner one because other packages rely on an out-of-date numpy API.

I think this is not the right way to describe the problem (see below).

...
This, and the comments Nathaniel made on the array function thread, are important to take note of. Would it be worth updating NEP 18 with a list of pitfalls? Or should this be a new informational NEP that discusses—on a higher level—the benefits, risks, and design considerations of providing protocols?

That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify.

1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue

Code to check this:

...
...
...
x1 = np.arange(5) x2 = np.asmatrix(x1) np.sum(x1) # works np.sum(x2) # works np.sum(x1, where=x1>3) # works np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError:

...
...
...
y = pd.Series(x1) np.sum(y) # works np.sum(y, where=y>3) # pandas throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the array-like objects with the methods that get forwarded to gain the same keyword.

tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`).

Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 4-5 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with array-like objects.

Cheers, Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

12:50 p.m.

Hi All, `nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers the plan to use `asarray` - sorry to have been sloppy with stating they "convert to array". And, yes, I'll admit to forgetting that we are introducing `where` only in 1.17 - clearly we cannot rely on other classes to have already implemented it!! (This is the problem with having done it myself - it seems ages ago!) For the old implementation, there was indeed only one way to override (`__array_function__`) for non-subclasses, but for the new implementation there are three ways, though the first two overlap: - support `.sum()` with all its arguments and `__array_ufunc__` for `isnan` - support `__array_ufunc__` including reductions (with all their arguments) - support `__array_function__` I think that eventually one would like to move to the case where there is only one (`__array_ufunc__` in this case, since really it is just a combination of two ufuncs, isnan and add.reduce in this case). Anyway, I guess this is still a good example to consider for how we should go about getting to a new implementation, ideally with just a single-way to override? Indeed, how do we actually envisage deprecating the use of `__array_function__` for a given part of the numpy API? Are we allowed to go cold-turkey if the new implementation is covered by `__array_ufunc__`? All the best, Marten On Thu, Jun 13, 2019 at 4:18 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:

...

On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer <shoyer@gmail.com> wrote:

...
On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...
Hi Ralf,

You're right, the problem is with the added keyword argument (which would appear also if we did not still have to support the old .sum method override but just dispatched to __array_ufunc__ with `np.add.reduce` - maybe worse given that it seems likely the reduce method has seen much less testing in __array_ufunc__ implementations).

Still, I do think the question stands: we implement a `nansum` for our ndarray class a certain way, and provide ways to override it (three now, in fact). Is it really reasonable to expect that we wait 4 versions for other packages to keep up with this, and thus get stuck with given internal implementations?

In general, I'd say yes. This is a problem of our own making. If an implementation uses np.sum on an argument that can be a subclass or duck array (i.e. we didn't coerce with asarray), then the code is calling out to outside of numpy. At that point we have effectively made something public. We can still change it, but only if we're sure that we don't break things in the process (i.e. we then insert a new asarray, something you're not happy about in general ....).

I don't get the "three ways to override nansum" by the way. There's only one I think: __array_function__. There may be more for the sum() method.

Another way to say the same thing: if a subclass does everything right, and we decide to add a new keyword to some function in 1.17 without considering those subclasses, then turning around straight after and saying "oh those subclasses rely on our old API, it may be okay to either break them or send them a deprecation warning" (which is I think what you were advocating for - your 4 and 3 options) is not reasonable.

...
...
Aside: note that the present version of the nanfunctions relies on turning the arguments into arrays and copying 0s into it - that suggests that currently they do not work for duck arrays like Dask.

There being an issue with matrix in the PR suggests there is an issue for subclasses at least?

...
Agreed. We could safely rewrite things to use np.asarray(), without any need to worry about backends compatibility. From an API perspective, nothing would change -- we already cast inputs into base numpy arrays inside the _replace_nan() routine.

In that case yes, nansum can be rewritten. However, it's only because of the use of (as)array - if it were asanyarray that would already scupper the plan.

Cheers, Ralf

...
...
All the best,

Marten

On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:

...
On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt < stefanv@berkeley.edu> wrote:

...
On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote:

...
In a way, I brought it up mostly as a concrete example of an internal implementation which we cannot change to an objectively cleaner one because other packages rely on an out-of-date numpy API.

I think this is not the right way to describe the problem (see below).

...
This, and the comments Nathaniel made on the array function thread, are important to take note of. Would it be worth updating NEP 18 with a list of pitfalls? Or should this be a new informational NEP that discusses—on a higher level—the benefits, risks, and design considerations of providing protocols?

That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify.

1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue

Code to check this:

...
...
> x1 = np.arange(5) > x2 = np.asmatrix(x1) > np.sum(x1) # works > np.sum(x2) # works > np.sum(x1, where=x1>3) # works > np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError:

...
...
> y = pd.Series(x1) > np.sum(y) # works > np.sum(y, where=y>3) # pandas throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the array-like objects with the methods that get forwarded to gain the same keyword.

tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`).

Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 4-5 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with array-like objects.

Cheers, Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Ralf Gommers

3:15 p.m.

On Thu, Jun 13, 2019 at 2:51 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

Hi All,

`nanfunctions` use `asanyarray` currently, which as Ralf notes scuppers the plan to use `asarray` - sorry to have been sloppy with stating they "convert to array".

And, yes, I'll admit to forgetting that we are introducing `where` only in 1.17 - clearly we cannot rely on other classes to have already implemented it!! (This is the problem with having done it myself - it seems ages ago!)

For the old implementation, there was indeed only one way to override (`__array_function__`) for non-subclasses, but for the new implementation there are three ways, though the first two overlap: - support `.sum()` with all its arguments and `__array_ufunc__` for `isnan` - support `__array_ufunc__` including reductions (with all their arguments) - support `__array_function__`

I think that eventually one would like to move to the case where there is only one (`__array_ufunc__` in this case, since really it is just a combination of two ufuncs, isnan and add.reduce in this case).

I don't think the other ones are really ways of "overriding nansum". They happen to work, but rely on internal implementation details and then overriding functions inside those details. It sounds like you're suggestion a new way of declaring "this is a canonical implementation that duck arrays can rely on". I can see that happening, but we probably want some standard way to mark those functions, have a note in the docs and some tests.

...

Anyway, I guess this is still a good example to consider for how we should go about getting to a new implementation, ideally with just a single-way to override?

Indeed, how do we actually envisage deprecating the use of `__array_function__` for a given part of the numpy API? Are we allowed to go cold-turkey if the new implementation is covered by `__array_ufunc__`?

I think __array_function__ is still the best way to do this (that's the only actual override, so most robust and performant likely), so I don't see any reason for a deprecation. Cheers, Ralf

...

All the best,

Marten

On Thu, Jun 13, 2019 at 4:18 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:

...
On Thu, Jun 13, 2019 at 3:16 AM Stephan Hoyer <shoyer@gmail.com> wrote:

...
On Wed, Jun 12, 2019 at 5:55 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...
Hi Ralf,

You're right, the problem is with the added keyword argument (which would appear also if we did not still have to support the old .sum method override but just dispatched to __array_ufunc__ with `np.add.reduce` - maybe worse given that it seems likely the reduce method has seen much less testing in __array_ufunc__ implementations).

Still, I do think the question stands: we implement a `nansum` for our ndarray class a certain way, and provide ways to override it (three now, in fact). Is it really reasonable to expect that we wait 4 versions for other packages to keep up with this, and thus get stuck with given internal implementations?

In general, I'd say yes. This is a problem of our own making. If an implementation uses np.sum on an argument that can be a subclass or duck array (i.e. we didn't coerce with asarray), then the code is calling out to outside of numpy. At that point we have effectively made something public. We can still change it, but only if we're sure that we don't break things in the process (i.e. we then insert a new asarray, something you're not happy about in general ....).

I don't get the "three ways to override nansum" by the way. There's only one I think: __array_function__. There may be more for the sum() method.

Another way to say the same thing: if a subclass does everything right, and we decide to add a new keyword to some function in 1.17 without considering those subclasses, then turning around straight after and saying "oh those subclasses rely on our old API, it may be okay to either break them or send them a deprecation warning" (which is I think what you were advocating for - your 4 and 3 options) is not reasonable.

...
...
Aside: note that the present version of the nanfunctions relies on turning the arguments into arrays and copying 0s into it - that suggests that currently they do not work for duck arrays like Dask.

There being an issue with matrix in the PR suggests there is an issue for subclasses at least?

...
Agreed. We could safely rewrite things to use np.asarray(), without any need to worry about backends compatibility. From an API perspective, nothing would change -- we already cast inputs into base numpy arrays inside the _replace_nan() routine.

In that case yes, nansum can be rewritten. However, it's only because of the use of (as)array - if it were asanyarray that would already scupper the plan.

Cheers, Ralf

...
...
All the best,

Marten

On Wed, Jun 12, 2019 at 4:32 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:

...
On Wed, Jun 12, 2019 at 12:02 AM Stefan van der Walt < stefanv@berkeley.edu> wrote:

...
On Tue, 11 Jun 2019 15:10:16 -0400, Marten van Kerkwijk wrote: > In a way, I brought it up mostly as a concrete example of an internal > implementation which we cannot change to an objectively cleaner one because > other packages rely on an out-of-date numpy API.

I think this is not the right way to describe the problem (see below).

...
This, and the comments Nathaniel made on the array function thread, are important to take note of. Would it be worth updating NEP 18 with a list of pitfalls? Or should this be a new informational NEP that discusses—on a higher level—the benefits, risks, and design considerations of providing protocols?

That would be a nice thing to do (the higher level one), but in this case I think the issue has little to do with NEP 18. The summary of the issue in this thread is a little brief, so let me try to clarify.

1. np.sum gained a new `where=` keyword in 1.17.0 2. using np.sum(x) will detect a `x.sum` method if it's present and try to use that 3. the `_wrapreduction` utility that forwards the function to the method will compare signatures of np.sum and x.sum, and throw an error if there's a mismatch for any keywords that have a value other than the default np._NoValue

Code to check this:

...
>> x1 = np.arange(5) >> x2 = np.asmatrix(x1) >> np.sum(x1) # works >> np.sum(x2) # works >> np.sum(x1, where=x1>3) # works >> np.sum(x2, where=x2>3) # _wrapreduction throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

Note that this is not specific to np.matrix. Using pandas.Series you also get a TypeError:

...
>> y = pd.Series(x1) >> np.sum(y) # works >> np.sum(y, where=y>3) # pandas throws TypeError ... TypeError: sum() got an unexpected keyword argument 'where'

The issue is that when we have this kind of forwarding logic, irrespective of how it's implemented, new keywords cannot be used until the array-like objects with the methods that get forwarded to gain the same keyword.

tl;dr this is simply a cost we have to be aware of when either proposing to add new keywords, or when proposing any kind of dispatching logic (in this case `_wrapreduction`).

Regarding internal use of `np.sum(..., where=)`: this should not be done until at least 4-5 versions from now, and preferably way longer than that. Because doing so will break already released versions of Pandas, Dask, and other libraries with array-like objects.

Cheers, Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

4:34 p.m.

Hi Ralf, others,

...

...
Anyway, I guess this is still a good example to consider for how we should go about getting to a new implementation, ideally with just a single-way to override?

Indeed, how do we actually envisage deprecating the use of `__array_function__` for a given part of the numpy API? Are we allowed to go cold-turkey if the new implementation is covered by `__array_ufunc__`?

I think __array_function__ is still the best way to do this (that's the only actual override, so most robust and performant likely), so I don't see any reason for a deprecation.

Yes, I fear I have to agree for the nan-functions, at least for now...

But how about `np.sum` itself? Right now, it is overridden by __array_function__ but classes without __array_function__ support can also override it through the method lookup and through __array_ufunc__. Would/should there be a point where we just have `sum = np.add.reduce` and drop other overrides? If so, how do we get there? One option might be start reversing the order in `_wrapreduction` - try `__array_ufunc__` if it is defined and only if that fails try the `.sum` method. All the best, Marten

Stephan Hoyer

4:45 p.m.

On Thu, Jun 13, 2019 at 9:35 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

Hi Ralf, others,

...
...
Anyway, I guess this is still a good example to consider for how we should go about getting to a new implementation, ideally with just a single-way to override?

Indeed, how do we actually envisage deprecating the use of `__array_function__` for a given part of the numpy API? Are we allowed to go cold-turkey if the new implementation is covered by `__array_ufunc__`?

I think __array_function__ is still the best way to do this (that's the only actual override, so most robust and performant likely), so I don't see any reason for a deprecation.

Yes, I fear I have to agree for the nan-functions, at least for now...

But how about `np.sum` itself? Right now, it is overridden by __array_function__ but classes without __array_function__ support can also override it through the method lookup and through __array_ufunc__.

Would/should there be a point where we just have `sum = np.add.reduce` and drop other overrides? If so, how do we get there?

One option might be start reversing the order in `_wrapreduction` - try `__array_ufunc__` if it is defined and only if that fails try the `.sum` method.

Yes, I think we would need to do this sort of thing. It's a bit of trouble, but probably doable with some decorator magic. It would indeed be nice for sum() to eventually just be np.add.reduce, though to be honest I'm not entirely sure it's worth the trouble of a long deprecation cycle -- people have been relying on the fall-back calling of methods for a long time.

...

All the best,

Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

5:06 p.m.

On Thu, Jun 13, 2019 at 12:46 PM Stephan Hoyer <shoyer@gmail.com> wrote: <snip>

...

...
But how about `np.sum` itself? Right now, it is overridden by __array_function__ but classes without __array_function__ support can also override it through the method lookup and through __array_ufunc__.

Would/should there be a point where we just have `sum = np.add.reduce` and drop other overrides? If so, how do we get there?

One option might be start reversing the order in `_wrapreduction` - try `__array_ufunc__` if it is defined and only if that fails try the `.sum` method.

Yes, I think we would need to do this sort of thing. It's a bit of trouble, but probably doable with some decorator magic. It would indeed be nice for sum() to eventually just be np.add.reduce, though to be honest I'm not entirely sure it's worth the trouble of a long deprecation cycle -- people have been relying on the fall-back calling of methods for a long time.

I guess the one immediate question is whether `np.sum` and the like should be overridden by `__array_function__` at all, given that what should be the future recommended override already works. (And, yes, arguably too late to change it now!) -- Marten

...

...
All the best,

Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Ralf Gommers

5:42 p.m.

On Thu, Jun 13, 2019 at 7:07 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

On Thu, Jun 13, 2019 at 12:46 PM Stephan Hoyer <shoyer@gmail.com> wrote: <snip>

...
...
But how about `np.sum` itself? Right now, it is overridden by __array_function__ but classes without __array_function__ support can also override it through the method lookup and through __array_ufunc__.

Would/should there be a point where we just have `sum = np.add.reduce` and drop other overrides? If so, how do we get there?

One option might be start reversing the order in `_wrapreduction` - try `__array_ufunc__` if it is defined and only if that fails try the `.sum` method.

Yes, I think we would need to do this sort of thing. It's a bit of trouble, but probably doable with some decorator magic. It would indeed be nice for sum() to eventually just be np.add.reduce, though to be honest I'm not entirely sure it's worth the trouble of a long deprecation cycle -- people have been relying on the fall-back calling of methods for a long time.

I guess the one immediate question is whether `np.sum` and the like should be overridden by `__array_function__` at all, given that what should be the future recommended override already works.

I'm not sure I understand the rationale for this. Design consistency matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and other functions __array_function__. Why keep on tweaking things for little benefit? Ralf

...

(And, yes, arguably too late to change it now!)

Marten van Kerkwijk

7:43 p.m.

Hi Ralf,

...

...
I guess the one immediate question is whether `np.sum` and the like should be overridden by `__array_function__` at all, given that what should be the future recommended override already works.

I'm not sure I understand the rationale for this. Design consistency matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and other functions __array_function__. Why keep on tweaking things for little benefit?

I'm mostly trying to understand how we would actually change things. I guess your quite logical argument for consistency is that it requires `np.sum is np.add.reduce`. But to do that, one would have to get rid of the `.sum()` method override, and then deprecate using `__array_function__` on `np.sum`. A perhaps clearer example is `np.isposinf`, where the implementation truly consists of ufunc calls only (indeed, it is in `lib/ufunc_like`). I guess following your consistency logic, one would not remove the `__array_function__` override until it actually became a ufunc itself. Anyway, I think this discussion has been useful, if only in making it yet more clear how difficult it is to deprecate anything! All the best, Marten

Ralf Gommers

10:34 p.m.

On Thu, Jun 13, 2019 at 9:43 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

Hi Ralf,

...
...
I guess the one immediate question is whether `np.sum` and the like should be overridden by `__array_function__` at all, given that what should be the future recommended override already works.

I'm not sure I understand the rationale for this. Design consistency matters. Right now the rule is simple: all ufuncs have __array_ufunc__, and other functions __array_function__. Why keep on tweaking things for little benefit?

I'm mostly trying to understand how we would actually change things.

I think we do have ways to deprecate things and reason about how to make the trade-offs to do so. Function overrides are not a whole lot different I think, we can apply the same method (plus there's a special bit in NEP 18 that we reserve the right to change functions into ufuncs and use `__array_ufunc__`). I guess your quite logical argument for consistency is that it requires

...

`np.sum is np.add.reduce`. But to do that, one would have to get rid of the `.sum()` method override, and then deprecate using `__array_function__` on `np.sum`.

A perhaps clearer example is `np.isposinf`, where the implementation truly consists of ufunc calls only (indeed, it is in `lib/ufunc_like`). I guess following your consistency logic, one would not remove the `__array_function__` override until it actually became a ufunc itself.

Correct. More importantly, I think we should not even consider *discussing* removing` __array_function__` from np.isposinf (or any similar one off situation) before there's a new bigger picture design. This is not about "deprecation is hard", this is about doing things with purpose rather than ad-hoc, as well as recognizing that lots of small changes are a major drain on our limited maintainer resources. About the latter issue I wrote a blog post recently, perhaps that clarifies the previous sentence a bit and gives some more insight in my perspective: https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/ Cheers, Ralf

...

Anyway, I think this discussion has been useful, if only in making it yet more clear how difficult it is to deprecate anything!

All the best,

Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

14 Jun 14 Jun

12:21 a.m.

Hi Ralf, Thanks both for the reply and sharing the link. I recognize much (from both sides!). <snip>

...

More importantly, I think we should not even consider *discussing* removing` __array_function__` from np.isposinf (or any similar one off situation) before there's a new bigger picture design. This is not about "deprecation is hard", this is about doing things with purpose rather than ad-hoc, as well as recognizing that lots of small changes are a major drain on our limited maintainer resources. About the latter issue I wrote a blog post recently, perhaps that clarifies the previous sentence a bit and gives some more insight in my perspective: https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/

Yes, I definitely did not mean to imply that a goal was to change just

`isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread was to clean up the whole `nanfunctions` module). Rather, to use them as examples to see what policy there actually is or should be; and I do worry that with __array_function__, however happy I am that it now exists (finally, Quantity can be concatenated!), we're going the Microsoft route of just layering on top of old code if even for the simplest cases there is no obvious path for how to remove it. Anyway, this discussion is probably gotten beyond the point where much is added. I'll close the `nansum` PR. All the best, Marten p.s. I would say that deprecations within numpy currently *are* hard. E.g., the rate at which we add `DEPRECATED` to the C code is substantially larger than that with which we actually remove any long-deprecated behaviour.

Charles R Harris

1 a.m.

On Thu, Jun 13, 2019 at 6:21 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

Hi Ralf,

Thanks both for the reply and sharing the link. I recognize much (from both sides!).

<snip>

...
More importantly, I think we should not even consider *discussing* removing` __array_function__` from np.isposinf (or any similar one off situation) before there's a new bigger picture design. This is not about "deprecation is hard", this is about doing things with purpose rather than ad-hoc, as well as recognizing that lots of small changes are a major drain on our limited maintainer resources. About the latter issue I wrote a blog post recently, perhaps that clarifies the previous sentence a bit and gives some more insight in my perspective: https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/

Yes, I definitely did not mean to imply that a goal was to change just

`isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread was to clean up the whole `nanfunctions` module). Rather, to use them as examples to see what policy there actually is or should be; and I do worry that with __array_function__, however happy I am that it now exists (finally, Quantity can be concatenated!), we're going the Microsoft route of just layering on top of old code if even for the simplest cases there is no obvious path for how to remove it.

Anyway, this discussion is probably gotten beyond the point where much is added. I'll close the `nansum` PR.

All the best,

Marten

p.s. I would say that deprecations within numpy currently *are* hard. E.g., the rate at which we add `DEPRECATED` to the C code is substantially larger than that with which we actually remove any long-deprecated behaviour.

That's been true for the last couple of releases, but not historically. The main problem was lack of time compounded by a FutureWarning that got stuck because it broke code in unexpected ways when brought to fruition. For 1.16 I decided that expiring spent deprecations wasn't worth the churn. At this point I see 1.18 as the release that starts the cleanup, and possibly the start of removing the Python 2.7 compatibility code. Speaking of 1.18, the other thing I'd like to see is discussion of a masked array replacement. I think the basic infrastructure is now in place for that. Hmm... should probably make a separate post eliciting ideas for 1.18. Chuck

Ralf Gommers

7:45 a.m.

On Fri, Jun 14, 2019 at 2:21 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

Hi Ralf,

Thanks both for the reply and sharing the link. I recognize much (from both sides!).

<snip>

...
More importantly, I think we should not even consider *discussing* removing` __array_function__` from np.isposinf (or any similar one off situation) before there's a new bigger picture design. This is not about "deprecation is hard", this is about doing things with purpose rather than ad-hoc, as well as recognizing that lots of small changes are a major drain on our limited maintainer resources. About the latter issue I wrote a blog post recently, perhaps that clarifies the previous sentence a bit and gives some more insight in my perspective: https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/

Yes, I definitely did not mean to imply that a goal was to change just

`isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread was to clean up the whole `nanfunctions` module). Rather, to use them as examples to see what policy there actually is or should be; and I do worry that with __array_function__, however happy I am that it now exists (finally, Quantity can be concatenated!), we're going the Microsoft route of just layering on top of old code if even for the simplest cases there is no obvious path for how to remove it.

I share that worry to some extent (an ever-growing collection of protocols). To be fair, we knew that __array_function__ wasn't perfect, but I think Stephan did a really good job making the case for why it was necessary, and that we needed to get something in place in 6-12 months rather than spending years on a more polished/comprehensive design. Given those constraints, we seem to have made a good choice that has largely achieved its goals. I'm still not sure you got my main objection. So I'll try once more. We now have a design, imperfect but it exists and is documented (to some extent). Let's call it design A. Now you're wanting clarity on a policy on how to move from design A to design B. However, what B even is isn't spelled out, although we can derive rough outlines from mailing list threads like these (it involves better handling of subclasses, allowing reuse of implementation of numpy functions in terms of other numpy functions, etc.). The right way forward is: 1. describe what design B is 2. decide if that design is a good idea 3. then worry about implementation and a migration policy Something that's specific to all nan-functions is still way too specific, and doesn't justify skipping 1-2 and jumping straight to 3. I don't know how to express it any clearer than the above in email format. If it doesn't make sense to you, it'd be great to have a call to discuss in person. Cheers, Ralf

Marten van Kerkwijk

15 Jun 15 Jun

2:10 a.m.

Hi Ralf, Thanks for the clarification. I think in your terms the bottom line was that I thought we had a design B for the case where a function was really "just a ufunc". But the nanfunctions show that even if logically they are a ufunc (which admittedly uses another ufunc or two for `where`), it is tricky, certainly trickier than I thought. And this discussion has served to clarify that even for other "simpler" functions it can get similarly tricky. Anyway, bottom line is that I think you are right in actually needed a more proper discussion/design about how to move forward. All the best, Marten p.s. And, yes, `__array_function__` is quite wonderful! On Fri, Jun 14, 2019 at 3:46 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:

...

On Fri, Jun 14, 2019 at 2:21 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...
Hi Ralf,

Thanks both for the reply and sharing the link. I recognize much (from both sides!).

<snip>

...
More importantly, I think we should not even consider *discussing* removing` __array_function__` from np.isposinf (or any similar one off situation) before there's a new bigger picture design. This is not about "deprecation is hard", this is about doing things with purpose rather than ad-hoc, as well as recognizing that lots of small changes are a major drain on our limited maintainer resources. About the latter issue I wrote a blog post recently, perhaps that clarifies the previous sentence a bit and gives some more insight in my perspective: https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/

Yes, I definitely did not mean to imply that a goal was to change just

`isposinf`, `sum`, or `nansum` (the goal of the PR that started this thread was to clean up the whole `nanfunctions` module). Rather, to use them as examples to see what policy there actually is or should be; and I do worry that with __array_function__, however happy I am that it now exists (finally, Quantity can be concatenated!), we're going the Microsoft route of just layering on top of old code if even for the simplest cases there is no obvious path for how to remove it.

I share that worry to some extent (an ever-growing collection of protocols). To be fair, we knew that __array_function__ wasn't perfect, but I think Stephan did a really good job making the case for why it was necessary, and that we needed to get something in place in 6-12 months rather than spending years on a more polished/comprehensive design. Given those constraints, we seem to have made a good choice that has largely achieved its goals.

I'm still not sure you got my main objection. So I'll try once more. We now have a design, imperfect but it exists and is documented (to some extent). Let's call it design A. Now you're wanting clarity on a policy on how to move from design A to design B. However, what B even is isn't spelled out, although we can derive rough outlines from mailing list threads like these (it involves better handling of subclasses, allowing reuse of implementation of numpy functions in terms of other numpy functions, etc.). The right way forward is:

1. describe what design B is 2. decide if that design is a good idea 3. then worry about implementation and a migration policy

Something that's specific to all nan-functions is still way too specific, and doesn't justify skipping 1-2 and jumping straight to 3.

I don't know how to express it any clearer than the above in email format. If it doesn't make sense to you, it'd be great to have a call to discuss in person.

Cheers, Ralf

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

1977

Age (days ago)

1982

Last active (days ago)

List overview

Download

22 comments

7 participants

participants (7)

Charles R Harris
Hameer Abbasi
Marten van Kerkwijk
Ralf Gommers
Sebastian Berg
Stefan van der Walt
Stephan Hoyer

Extent to which to work around matrix and other duck/subclass limitations

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

Marten van Kerkwijk

tags

participants (7)