Adding a flag to allow integer array access and masking

I have not heard any feedback back on my proposal to add a final object to the extended slice syntax to current Numeric to allow for unambiguous index and mask-array access. As a modification to the proposal, suppose we just check to see if the last argument (of at least two) is a 0d array of type signed byte (currently this is illegal and will raise an error). This number would be a flag indicating how to interpret the previous objects. Of course these numbers would be hidden from the user who would write: a[index_array, _I] = <values> b = a[index_array, _I] or a[mask_array, _M] = <values> b = a[mask_array, _M] where _M is a 0d signed byte array indicating that the mask_array should be interpreted as a mask while _I is a 0d signed byte array indicating that the index_array should be interpreted as a integers into the flattened version of a. Other indexing schemes could be envisioned as well a[a1,a2,a3,_X] could be the cross product of the integer arrays a1, a2, and a3 for example. or a[a1, a2, a3, _Z] could select elements from a by "zipping" the sequences a1, a2, and a3 together to form a list of tuples to grab from a. Comments?

Travis Oliphant writes:
I have not heard any feedback back on my proposal to add a final object to the extended slice syntax to current Numeric to allow for unambiguous index and mask-array access.
As a modification to the proposal, suppose we just check to see if the last argument (of at least two) is a 0d array of type signed byte (currently this is illegal and will raise an error). This number would be a flag indicating how to interpret the previous objects. Of course these numbers would be hidden from the user who would write:
a[index_array, _I] = <values> b = a[index_array, _I]
or
a[mask_array, _M] = <values> b = a[mask_array, _M]
where _M is a 0d signed byte array indicating that the mask_array should be interpreted as a mask while _I is a 0d signed byte array indicating that the index_array should be interpreted as a integers into the flattened version of a.
Other indexing schemes could be envisioned as well
a[a1,a2,a3,_X] could be the cross product of the integer arrays a1, a2, and a3 for example.
or
a[a1, a2, a3, _Z] could select elements from a by "zipping" the sequences a1, a2, and a3 together to form a list of tuples to grab from a.
Comments?
Like Greg I'm wary of having many different interpretations for indexing behavior (I'm not even that crazy about having numarray handle boolean index arrays differently than the others --something we haven't implemented yet, and perhaps we shouldn't). Before discussing the merits of this, shouldn't we take the attitude that absence of feedback is not necessarily equivalent to approval, particularly for something that affects the public interface of the module? I would feel better about this if I saw several affirming the need for such features rather than few openly opposing it. But if one were to do something like this, I would use a different kind of object than 0d arrays, e.g., an instance of a class defined for just that purpose. You would really want to make sure that no data could mistakenly be interpreted as a flag, even if the chances were remote. I would also not use an underscore as the beginning of the name. Maybe I'm wrong about this, but I've come to take that to mean its a private variable that should not be used by users of the module, and that usage would confuse that. Finally, the name of the flag should be descriptive (e.g. MaskInd). But there could be better alternatives. As an example, x[nonzero(maskarray)] instead of x[maskarray, MaskInd] (Yes, it does generate a temporary so that is a drawback) Perry

Like Greg I'm wary of having many different interpretations for indexing behavior (I'm not even that crazy about having numarray handle boolean index arrays differently than the others --something we haven't implemented yet, and perhaps we shouldn't).
You may be wary, but there are already multiple ways people think about using integers to index arrays. I'm trying to suggest a facility that allows several different interpretations of array access.
Before discussing the merits of this, shouldn't we take the attitude that absence of feedback is not necessarily equivalent to approval, particularly for something that affects the public interface of the module? I would feel better about this if I saw several affirming the need for such features rather than few openly opposing it.
I do have this view. I'm not changing anything, right now. Well, I affirm that this is one of the drawbacks of Numeric as compared with other array-oriented environments. We definitely need a way to index an array using integers and masks. I guess if nobody else feels this way, then I'm alone in my discomfort.
But if one were to do something like this, I would use a different kind of object than 0d arrays, e.g., an instance of a class defined for just that purpose.
We could do that as well.
You would really want to make sure that no data could mistakenly be interpreted as a flag, even if the chances were remote. I would also not use an underscore as the beginning of the name.
I'm not particularly wedded to _I notation, it was just a start.
Maybe I'm wrong about this, but I've come to take that to mean its a private variable that should not be used by users of the module, and that usage would confuse that. Finally, the name of the flag should be descriptive (e.g. MaskInd).
But there could be better alternatives. As an example,
x[nonzero(maskarray)] instead of x[maskarray, MaskInd]
I've thought about that, too, it would work if nonzero returned some class that stored away (but didn't copy) the maskarray info. -Travis

Travis Oliphant <oliphant@ee.byu.edu> writes:
Well, I affirm that this is one of the drawbacks of Numeric as compared with other array-oriented environments. We definitely need a way to index an array using integers and masks.
I guess if nobody else feels this way, then I'm alone in my discomfort.
No, I basically agree, I just don't have that need immediately and therefore am less motivated to work on it. My preferred solution would be to use special objects (in the spirit of the slice object) for special indexing methods, rather than special cases of existing objects. The advantage is that any number of those can be added over time as the need arises, and there is never a risk of changing the meaning of existing code. However, I do think that this should be thought out and discussed carefully, but unfortunately I won't be able to help much due to lack of time. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

No, I basically agree, I just don't have that need immediately and therefore am less motivated to work on it.
My preferred solution would be to use special objects (in the spirit of the slice object) for special indexing methods, rather than special cases of existing objects. The advantage is that any number of those can be added over time as the need arises, and there is never a risk of changing the meaning of existing code.
Thanks for the comments you have made. I always appreciate them. Are you suggesting something like: b = IndexArray([1,3,10,100]) a[b]? This is really not much different than. a[[1,3,10,100],IndexArray] which is essentially what I've suggested (I was looking for shortcuts), but in principle IndexArray could be a class with a method that the code in Numeric interfaces with.
However, I do think that this should be thought out and discussed carefully, but unfortunately I won't be able to help much due to lack of time.
Thanks for participating thus far. -Travis

Are you suggesting something like:
b = IndexArray([1,3,10,100])
a[b]?
Exactly. With IndexArray being some special object (if only a thin wrapper), that prints differently from a simple array and can be type-tested.
This is really not much different than.
a[[1,3,10,100],IndexArray]
Except that in the first case, there is exactly one indexing object per axis, the operation can be a different one along each axis, and the index object carries specifies its own meaning. But the effect is the same, of course. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------
participants (4)
-
Konrad Hinsen
-
Perry Greenfield
-
Travis Oliphant
-
Travis Oliphant