Should ndarray subclasses support the keepdims arg?

Hello all, A question: Many ndarray methods (eg sum, mean, any, min) have a "keepdims" keyword argument, but ndarray subclass methods sometimes don't. The 'matrix' subclass doesn't, and numpy functions like 'np.sum' intentionally drop/ignore the keepdims argument when called with an ndarray subclass as first argument. This means you can't always use ndarray subclasses as 'drop in' replacement for ndarrays if the code uses keepdims (even indirectly), and it means code that deals with keepdims (eg np.sum and more) has to detect ndarray subclasses and drop keepdims even if the subclass supports it (since there is no good way to detect support). It seems to me that if we are going to use inheritance, subclass methods should keep the signature of the parent class methods. What does the list think? ---- Details: ---- This problem comes up in a PR I'm working on (#5706) to add the keepdims arg to masked array methods. In order to support masked matrices (which a lot of unit tests check), I would have to detect and drop the keepdims arg to avoid an exception. This would be solved if the matrix class supported keepdims (plus an update to np.sum). Similarly, `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it could work if all subclasses supported keepdims. I do not foresee immediate problems with adding keepdims to the matrix methods, except that it would be an unused argument. Modifying `np.sum` to always pass on the keepdims arg is trickier, since it would break any code that tried to np.sum a subclass that doesn't support keepdims, eg pandas.DataFrame. **kwargs tricks might work. But if it's permissible I think it would be better to require subclasses to support all the keyword args ndarray supports. Allan

On Di, 2015-05-05 at 11:13 -0400, Allan Haldane wrote:
Hello all,
A question:
Many ndarray methods (eg sum, mean, any, min) have a "keepdims" keyword argument, but ndarray subclass methods sometimes don't. The 'matrix' subclass doesn't, and numpy functions like 'np.sum' intentionally drop/ignore the keepdims argument when called with an ndarray subclass as first argument.
This means you can't always use ndarray subclasses as 'drop in' replacement for ndarrays if the code uses keepdims (even indirectly), and it means code that deals with keepdims (eg np.sum and more) has to detect ndarray subclasses and drop keepdims even if the subclass supports it (since there is no good way to detect support). It seems to me that if we are going to use inheritance, subclass methods should keep the signature of the parent class methods. What does the list think?
---- Details: ----
This problem comes up in a PR I'm working on (#5706) to add the keepdims arg to masked array methods. In order to support masked matrices (which a lot of unit tests check), I would have to detect and drop the keepdims arg to avoid an exception. This would be solved if the matrix class supported keepdims (plus an update to np.sum). Similarly, `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it could work if all subclasses supported keepdims.
I do not foresee immediate problems with adding keepdims to the matrix methods, except that it would be an unused argument. Modifying `np.sum` to always pass on the keepdims arg is trickier, since it would break any code that tried to np.sum a subclass that doesn't support keepdims, eg pandas.DataFrame. **kwargs tricks might work. But if it's permissible I think it would be better to require subclasses to support all the keyword args ndarray supports.
What is the advantage over having an error raised due to the invalid **kwargs trick when the subclass does not support it? First sight it seems like a far shot have a hard requirement. The transition period alone seems hard, unless we have add magic to test the subclass upon creation, and I am not sure that is easy to do (something like ABC conformance test). - Sebastian
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

AFAICT the only real solution here is for np.sum and friends to propagate the keepdims argument if and only if it was explicitly passed to them (or maybe the slightly different, if and only if it has a non-default value). If we just started requiring code to handle it and passing it unconditionally, then as soon as someone upgraded numpy all their existing code might break for no good reason. On May 5, 2015 8:13 AM, "Allan Haldane" <allanhaldane@gmail.com> wrote:
Hello all,
A question:
Many ndarray methods (eg sum, mean, any, min) have a "keepdims" keyword argument, but ndarray subclass methods sometimes don't. The 'matrix' subclass doesn't, and numpy functions like 'np.sum' intentionally drop/ignore the keepdims argument when called with an ndarray subclass as first argument.
This means you can't always use ndarray subclasses as 'drop in' replacement for ndarrays if the code uses keepdims (even indirectly), and it means code that deals with keepdims (eg np.sum and more) has to detect ndarray subclasses and drop keepdims even if the subclass supports it (since there is no good way to detect support). It seems to me that if we are going to use inheritance, subclass methods should keep the signature of the parent class methods. What does the list think?
---- Details: ----
This problem comes up in a PR I'm working on (#5706) to add the keepdims arg to masked array methods. In order to support masked matrices (which a lot of unit tests check), I would have to detect and drop the keepdims arg to avoid an exception. This would be solved if the matrix class supported keepdims (plus an update to np.sum). Similarly, `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it could work if all subclasses supported keepdims.
I do not foresee immediate problems with adding keepdims to the matrix methods, except that it would be an unused argument. Modifying `np.sum` to always pass on the keepdims arg is trickier, since it would break any code that tried to np.sum a subclass that doesn't support keepdims, eg pandas.DataFrame. **kwargs tricks might work. But if it's permissible I think it would be better to require subclasses to support all the keyword args ndarray supports.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Maybe they should have written their code with **kwargs that consumes all keyword arguments rather than assuming that no keyword arguments would be added? The problem with this approach in general is that it makes writing code unnecessarily convoluted. On Tue, May 5, 2015 at 1:55 PM, Nathaniel Smith <njs@pobox.com> wrote:
AFAICT the only real solution here is for np.sum and friends to propagate the keepdims argument if and only if it was explicitly passed to them (or maybe the slightly different, if and only if it has a non-default value). If we just started requiring code to handle it and passing it unconditionally, then as soon as someone upgraded numpy all their existing code might break for no good reason. On May 5, 2015 8:13 AM, "Allan Haldane" <allanhaldane@gmail.com> wrote:
Hello all,
A question:
Many ndarray methods (eg sum, mean, any, min) have a "keepdims" keyword argument, but ndarray subclass methods sometimes don't. The 'matrix' subclass doesn't, and numpy functions like 'np.sum' intentionally drop/ignore the keepdims argument when called with an ndarray subclass as first argument.
This means you can't always use ndarray subclasses as 'drop in' replacement for ndarrays if the code uses keepdims (even indirectly), and it means code that deals with keepdims (eg np.sum and more) has to detect ndarray subclasses and drop keepdims even if the subclass supports it (since there is no good way to detect support). It seems to me that if we are going to use inheritance, subclass methods should keep the signature of the parent class methods. What does the list think?
---- Details: ----
This problem comes up in a PR I'm working on (#5706) to add the keepdims arg to masked array methods. In order to support masked matrices (which a lot of unit tests check), I would have to detect and drop the keepdims arg to avoid an exception. This would be solved if the matrix class supported keepdims (plus an update to np.sum). Similarly, `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it could work if all subclasses supported keepdims.
I do not foresee immediate problems with adding keepdims to the matrix methods, except that it would be an unused argument. Modifying `np.sum` to always pass on the keepdims arg is trickier, since it would break any code that tried to np.sum a subclass that doesn't support keepdims, eg pandas.DataFrame. **kwargs tricks might work. But if it's permissible I think it would be better to require subclasses to support all the keyword args ndarray supports.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On May 5, 2015 11:05 AM, "Neil Girdhar" <mistersheik@gmail.com> wrote:
Maybe they should have written their code with **kwargs that consumes all
keyword arguments rather than assuming that no keyword arguments would be added? The problem with this approach in general is that it makes writing code unnecessarily convoluted. If the user asked for keepdims=True, then silently ignoring this is worse than raising an error. And I guess I would call this making code necessarily convoluted :-). There are not that many options for evolving an interface shared by multiple unrelated libraries. -n

That makes sense. I think it's the way to go, thanks. The downside is using **kwargs instead of an explicit keepdims arg gives a more obscure signature, but using the new __signature__ attribute this could be hidden for python 3 users and python2 using ipython3+. On 05/05/2015 01:55 PM, Nathaniel Smith wrote:
AFAICT the only real solution here is for np.sum and friends to propagate the keepdims argument if and only if it was explicitly passed to them (or maybe the slightly different, if and only if it has a non-default value). If we just started requiring code to handle it and passing it unconditionally, then as soon as someone upgraded numpy all their existing code might break for no good reason.
On May 5, 2015 8:13 AM, "Allan Haldane" <allanhaldane@gmail.com <mailto:allanhaldane@gmail.com>> wrote:
Hello all,
A question:
Many ndarray methods (eg sum, mean, any, min) have a "keepdims" keyword argument, but ndarray subclass methods sometimes don't. The 'matrix' subclass doesn't, and numpy functions like 'np.sum' intentionally drop/ignore the keepdims argument when called with an ndarray subclass as first argument.
This means you can't always use ndarray subclasses as 'drop in' replacement for ndarrays if the code uses keepdims (even indirectly), and it means code that deals with keepdims (eg np.sum and more) has to detect ndarray subclasses and drop keepdims even if the subclass supports it (since there is no good way to detect support). It seems to me that if we are going to use inheritance, subclass methods should keep the signature of the parent class methods. What does the list think?
---- Details: ----
This problem comes up in a PR I'm working on (#5706) to add the keepdims arg to masked array methods. In order to support masked matrices (which a lot of unit tests check), I would have to detect and drop the keepdims arg to avoid an exception. This would be solved if the matrix class supported keepdims (plus an update to np.sum). Similarly, `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it could work if all subclasses supported keepdims.
I do not foresee immediate problems with adding keepdims to the matrix methods, except that it would be an unused argument. Modifying `np.sum` to always pass on the keepdims arg is trickier, since it would break any code that tried to np.sum a subclass that doesn't support keepdims, eg pandas.DataFrame. **kwargs tricks might work. But if it's permissible I think it would be better to require subclasses to support all the keyword args ndarray supports.
Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (4)
-
Allan Haldane
-
Nathaniel Smith
-
Neil Girdhar
-
Sebastian Berg