Proposal: add ndarray.keys() to return dtype.names

I first proposed this on GitHub: https://github.com/numpy/numpy/issues/5134 ; jaimefrio requested that I bring it to this list for discussion. My proposal is to add a keys() method to NumPy's array class ndarray. The behavior would be to return self.dtype.names, i.e. the "column names" for a structured array (and None when dtype.names is None, which it is for pure numeric arrays without named columns). I originally proposed to add a values() method also, but I am tabling that for now so we needn't discuss it in this thread. The motivation is to enhance the ability to use duck typing with NumPy arrays, Python dicts, and other types like Pandas DataFrames, h5py Files, and more. It's a fairly common thing to want to get the "keys" of a container, where "keys" is understood to be a sequence of values one can pass to __getitem__(), and this is exactly what I'm aiming at. Thoughts? John Zwinck

Sounds fair to me. Indeed the ducktyping argument makes sense, and I have a hard time imagining any namespace conflicts or other confusion. Should this attribute return none for non-structured arrays, or simply be undefined? On Tue, Sep 30, 2014 at 12:49 PM, John Zwinck <jzwinck@gmail.com> wrote:
I first proposed this on GitHub: https://github.com/numpy/numpy/issues/5134 ; jaimefrio requested that I bring it to this list for discussion.
My proposal is to add a keys() method to NumPy's array class ndarray. The behavior would be to return self.dtype.names, i.e. the "column names" for a structured array (and None when dtype.names is None, which it is for pure numeric arrays without named columns).
I originally proposed to add a values() method also, but I am tabling that for now so we needn't discuss it in this thread.
The motivation is to enhance the ability to use duck typing with NumPy arrays, Python dicts, and other types like Pandas DataFrames, h5py Files, and more. It's a fairly common thing to want to get the "keys" of a container, where "keys" is understood to be a sequence of values one can pass to __getitem__(), and this is exactly what I'm aiming at.
Thoughts?
John Zwinck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

I am also +1. I have already used structured arrays to do keyword-based string formatting. This makes sense as well. Would this enable keyword argument expansion? On Tue, Sep 30, 2014 at 7:29 AM, Eelco Hoogendoorn < hoogendoorn.eelco@gmail.com> wrote:
Sounds fair to me. Indeed the ducktyping argument makes sense, and I have a hard time imagining any namespace conflicts or other confusion. Should this attribute return none for non-structured arrays, or simply be undefined?
On Tue, Sep 30, 2014 at 12:49 PM, John Zwinck <jzwinck@gmail.com> wrote:
I first proposed this on GitHub: https://github.com/numpy/numpy/issues/5134 ; jaimefrio requested that I bring it to this list for discussion.
My proposal is to add a keys() method to NumPy's array class ndarray. The behavior would be to return self.dtype.names, i.e. the "column names" for a structured array (and None when dtype.names is None, which it is for pure numeric arrays without named columns).
I originally proposed to add a values() method also, but I am tabling that for now so we needn't discuss it in this thread.
The motivation is to enhance the ability to use duck typing with NumPy arrays, Python dicts, and other types like Pandas DataFrames, h5py Files, and more. It's a fairly common thing to want to get the "keys" of a container, where "keys" is understood to be a sequence of values one can pass to __getitem__(), and this is exactly what I'm aiming at.
Thoughts?
John Zwinck _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

I like this idea. But I am -1 on returning None if the array is unstructured. I expect .keys(), if present, to always return an iterable. In fact, this would break some of my existing code, which checks for the existence of "keys" as a way to do duck typed checks for dictionary like objects (e.g., including pandas.DataFrame): https://github.com/xray/xray/blob/v0.3/xray/core/utils.py#L165

So a non-structured array should return an empty list/iterable as its keys? That doesn't seem right to me, but perhaps you have a compelling example to the contrary. I mean, wouldn't we want the duck-typing to fail if it isn't a structured array? Throwing an attributeError seems like the best thing to do, from a duck-typing perspective. On Tue, Sep 30, 2014 at 8:05 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
I like this idea. But I am -1 on returning None if the array is unstructured. I expect .keys(), if present, to always return an iterable.
In fact, this would break some of my existing code, which checks for the existence of "keys" as a way to do duck typed checks for dictionary like objects (e.g., including pandas.DataFrame): https://github.com/xray/xray/blob/v0.3/xray/core/utils.py#L165
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On more careful reading of your words, I think we agree; indeed, if keys() is present is should return an iterable; but I don't think it should be present for non-structured arrays. On Tue, Sep 30, 2014 at 10:21 PM, Eelco Hoogendoorn < hoogendoorn.eelco@gmail.com> wrote:
So a non-structured array should return an empty list/iterable as its keys? That doesn't seem right to me, but perhaps you have a compelling example to the contrary.
I mean, wouldn't we want the duck-typing to fail if it isn't a structured array? Throwing an attributeError seems like the best thing to do, from a duck-typing perspective.
On Tue, Sep 30, 2014 at 8:05 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
I like this idea. But I am -1 on returning None if the array is unstructured. I expect .keys(), if present, to always return an iterable.
In fact, this would break some of my existing code, which checks for the existence of "keys" as a way to do duck typed checks for dictionary like objects (e.g., including pandas.DataFrame): https://github.com/xray/xray/blob/v0.3/xray/core/utils.py#L165
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn < hoogendoorn.eelco@gmail.com> wrote:
On more careful reading of your words, I think we agree; indeed, if keys() is present is should return an iterable; but I don't think it should be present for non-structured arrays.
Indeed, I think we do agree. The attribute can simply be missing (e.g., accessing it raises AttributeError) for non-structured arrays.

On 1 Oct 2014 04:30, "Stephan Hoyer" <shoyer@gmail.com> wrote:
On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn <
hoogendoorn.eelco@gmail.com> wrote:
On more careful reading of your words, I think we agree; indeed, if
keys() is present is should return an iterable; but I don't think it should be present for non-structured arrays.
Indeed, I think we do agree. The attribute can simply be missing (e.g., accessing it raises AttributeError) for non-structured arrays.
I'm generally fine with this, though I would like to know if there is precedent for methods being present on structured arrays only. Even if there is no precedent I am still OK with the idea, I just think we should understand if how novel this will be.

Well, the method will have to be present on all ndarrays, since structured arrays do not have a different type from regular arrays, only a different dtype. Thus the attribute has to be present regardless, but some Exception will have to be raised depending on the dtype, to make it quack like the kind of duck it is, so to speak. Indeed it seems like an atypical design pattern; but I don't see a problem with it. On Wed, Oct 1, 2014 at 4:08 PM, John Zwinck <jzwinck@gmail.com> wrote:
On 1 Oct 2014 04:30, "Stephan Hoyer" <shoyer@gmail.com> wrote:
On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn <
hoogendoorn.eelco@gmail.com> wrote:
On more careful reading of your words, I think we agree; indeed, if
keys() is present is should return an iterable; but I don't think it should be present for non-structured arrays.
Indeed, I think we do agree. The attribute can simply be missing (e.g., accessing it raises AttributeError) for non-structured arrays.
I'm generally fine with this, though I would like to know if there is precedent for methods being present on structured arrays only. Even if there is no precedent I am still OK with the idea, I just think we should understand if how novel this will be.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Actually, if I remember correctly, special methods show up in the ndarray object when the dtype is datetime64, right? On Wed, Oct 1, 2014 at 10:13 AM, Eelco Hoogendoorn < hoogendoorn.eelco@gmail.com> wrote:
Well, the method will have to be present on all ndarrays, since structured arrays do not have a different type from regular arrays, only a different dtype. Thus the attribute has to be present regardless, but some Exception will have to be raised depending on the dtype, to make it quack like the kind of duck it is, so to speak. Indeed it seems like an atypical design pattern; but I don't see a problem with it.
On Wed, Oct 1, 2014 at 4:08 PM, John Zwinck <jzwinck@gmail.com> wrote:
On 1 Oct 2014 04:30, "Stephan Hoyer" <shoyer@gmail.com> wrote:
On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn <
hoogendoorn.eelco@gmail.com> wrote:
On more careful reading of your words, I think we agree; indeed, if
keys() is present is should return an iterable; but I don't think it should be present for non-structured arrays.
Indeed, I think we do agree. The attribute can simply be missing (e.g., accessing it raises AttributeError) for non-structured arrays.
I'm generally fine with this, though I would like to know if there is precedent for methods being present on structured arrays only. Even if there is no precedent I am still OK with the idea, I just think we should understand if how novel this will be.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Ah yes; you can use.. from types import MethodType ...to dynamically add methods to specific instances of a type. This may be cleaner or more pythonic than performing a check within the method, I dunno. On Wed, Oct 1, 2014 at 4:41 PM, Benjamin Root <ben.root@ou.edu> wrote:
Actually, if I remember correctly, special methods show up in the ndarray object when the dtype is datetime64, right?
On Wed, Oct 1, 2014 at 10:13 AM, Eelco Hoogendoorn < hoogendoorn.eelco@gmail.com> wrote:
Well, the method will have to be present on all ndarrays, since structured arrays do not have a different type from regular arrays, only a different dtype. Thus the attribute has to be present regardless, but some Exception will have to be raised depending on the dtype, to make it quack like the kind of duck it is, so to speak. Indeed it seems like an atypical design pattern; but I don't see a problem with it.
On Wed, Oct 1, 2014 at 4:08 PM, John Zwinck <jzwinck@gmail.com> wrote:
On 1 Oct 2014 04:30, "Stephan Hoyer" <shoyer@gmail.com> wrote:
On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn <
hoogendoorn.eelco@gmail.com> wrote:
On more careful reading of your words, I think we agree; indeed, if
keys() is present is should return an iterable; but I don't think it should be present for non-structured arrays.
Indeed, I think we do agree. The attribute can simply be missing (e.g., accessing it raises AttributeError) for non-structured arrays.
I'm generally fine with this, though I would like to know if there is precedent for methods being present on structured arrays only. Even if there is no precedent I am still OK with the idea, I just think we should understand if how novel this will be.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

So, for non-structured arrays, the consens is an Exception. The question is, which one. AttributeError would be fully backwards compatible. Existing code checks for the method and if it exists, the object has fields. ValueError would make more sense, as the value - the array - is in wrong format/structure/type. regards, On 2014-10-01 16:47, Eelco Hoogendoorn wrote:
Ah yes; you can use..
from types import MethodType
...to dynamically add methods to specific instances of a type. This may be cleaner or more pythonic than performing a check within the method, I dunno.
On Wed, Oct 1, 2014 at 4:41 PM, Benjamin Root <ben.root@ou.edu> wrote:
Actually, if I remember correctly, special methods show up in the ndarray object when the dtype is datetime64, right?
On Wed, Oct 1, 2014 at 10:13 AM, Eelco Hoogendoorn <hoogendoorn.eelco@gmail.com> wrote:
Well, the method will have to be present on all ndarrays, since structured arrays do not have a different type from regular arrays, only a different dtype. Thus the attribute has to be present regardless, but some Exception will have to be raised depending on the dtype, to make it quack like the kind of duck it is, so to speak. Indeed it seems like an atypical design pattern; but I don't see a problem with it.
On Wed, Oct 1, 2014 at 4:08 PM, John Zwinck <jzwinck@gmail.com> wrote:
On 1 Oct 2014 04:30, "Stephan Hoyer" <shoyer@gmail.com> wrote:
On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn
<hoogendoorn.eelco@gmail.com> wrote:
On more careful reading of your words, I think we agree; indeed,
if keys() is present is should return an iterable; but I don't think it should be present for non-structured arrays.
Indeed, I think we do agree. The attribute can simply be missing (e.g., accessing it raises AttributeError) for non-structured arrays.
I'm generally fine with this, though I would like to know if there is precedent for methods being present on structured arrays only. Even if there is no precedent I am still OK with the idea, I just think we should understand if how novel this will be. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion [1]
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion [1]
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion [1]
Links: ------ [1] http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, Oct 2, 2014 at 6:27 PM, Sebastian Wagner <sebix@sebix.at> wrote:
So, for non-structured arrays, the consens is an Exception. The question is, which one. AttributeError would be fully backwards compatible. Existing code checks for the method and if it exists, the object has fields. ValueError would make more sense, as the value - the array - is in wrong format/structure/type.
If a non-structured array has its keys() raise AttributeError, I think that hasattr(arr, "keys") should return False, which implies that getattr(arr, "keys") should throw AttributeError. This would require that ndarray.__getattribute__ raise AttributeError, meaning "the attribute isn't here," not merely "the attribute doesn't have a value now." Otherwise people may rightly complain when they interrogate an array to see if it has keys(), find out that it does, but then get an error upon calling it which could have been known ahead of time. Now, to actually implement it this way would seem to require setting the "tp_getattro" function pointer (https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_getattro). Currently PyArray_Type has this field as null. If I understand everything correctly, adding a non-null function pointer here would mean some small runtime overhead to resolve every attribute on ndarray. I could easily be missing some detail which would allow us to do what I described above without a performance hit, but someone better versed would need to explain how/why. If I'm right about all that, and if the consensus is that keys() should raise an exception when dtype.names is None, then perhaps raising ValueError is the only viable option. I'd appreciate opinions from those experienced in the details of the C API.
participants (5)
-
Benjamin Root
-
Eelco Hoogendoorn
-
John Zwinck
-
Sebastian Wagner
-
Stephan Hoyer