[Numpy-discussion] Adding the ability to "clone" a few fields from a data-type

Thu Oct 30 13:50:49 EDT 2008

Francesc Alted wrote:
> A Thursday 30 October 2008, Robert Kern escrigué:
> [clip]
>   
>>>> OTOH, now that I think about it, I don't think there is really any
>>>> coherent way to mix field selection with any other indexing
>>>> operations. At least, not within the same brackets. Hmm. So maybe
>>>> the link to fancy indexing can be ignored as, ahem, fanciful.
>>>>         
>>> Well, one can always check that fields in the fancy list are either
>>> strings (map to name fields) or integers (map to positional
>>> fields). However, I'm not sure if this check would be too
>>> expensive.
>>>       
>> That's not my concern. The problem is that the field-indexing applies
>> to the entire array, not just an axis. So what would the following
>> mean?
>>
>>   a[['foo', 'bar'], [1,2,3]]
>>
>> Compared to
>>
>>   a[[5,8,10], [1,2,3]]
>>     
>
> Well, as I see them, fields are like another axis, just that it is 
> always the leading one.  In order to cope with them we could use a 
> generalization of what it works already:
>
> In [15]: ra = numpy.zeros((3,4), "i4,f4")
>
> In [16]: ra['f1'][[1,2],[0,3]]  # this already works
> Out[16]: array([ 0.,  0.], dtype=float32)
>
> In [17]: ra[['f1','f2']][[1,2],[0,3]]   # this could be make to work
> Out[17]:
> array([(0, 0.0), (0, 0.0)],
>       dtype=[('f0', '<i4'), ('f1', '<f4')])
>
>   
>>>> Overall, I guess, I would present the feature slightly
>>>> differently. Provide a kind of inaccessible and invisible dtype
>>>> for implementing dummy fields. This is useful in other places like
>>>> file parsing. At the same time, implement a function that uses
>>>> this capability to make views with a subset of the fields of a
>>>> structured array. I'm not sure that people need an API for
>>>> replacing the fields of a dtype like this.
>>>>         
>>> Mmh, not sure on what you are proposing there.  You mean something
>>> like:
>>>
>>> In [21]: t = numpy.dtype([('f0','i4'),('f1', 'f8'), ('f2', 'S20')])
>>>
>>> In [22]: nt = t.astype(['f2', 'f0'])
>>>
>>> In [23]: ra = numpy.zeros(10, dtype=t)
>>>
>>> In [24]: nra = ra.view(nt)
>>>
>>> In [25]: ra
>>> Out[25]:
>>> array([(0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''),
>>>       (0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''), (0, 0.0, ''),
>>>       (0, 0.0, ''), (0, 0.0, '')],
>>>      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '|S20')])
>>>
>>> In [26]: nra
>>> Out[26]:
>>> array([('', 0), ('', 0), ('', 0), ('', 0), ('', 0), ('', 0), ('',
>>> 0), ('', 0), ('', 0), ('', 0)],
>>>      dtype=[('f2', '|S20'), ('f0', '<i4')])
>>>
>>> ?
>>>
>>> In that case, that would be a great feature to add.
>>>       
>> That's what Travis is proposing. I would like to see a function that
>> does this (however it is implemented under the covers):
>>
>>   nra = subset_fields(ra, ['f0', 'f2'])
>>     
>
> Interesting.
>
>   
>> With the view, I don't think you can reorder the fields as in your
>> example.
>>     
>
> That's a pity.  Providing a dtype with the notion of an internal reorder 
> can be very powerful in some situations.  But I guess that implementing 
> this would be complicated.
>
>   
In general I agree with the idea but this starts sounding like R's data 
frames. So, is part of the goal to replicate some of the function of R's 
data frames?
For example the extract function 
(http://rweb.stat.umn.edu/R/library/base/html/Extract.data.frame.html)
(there is also the cookbook example of setting a name to null to remove 
it, see http://www.r-cookbook.com/node/50).


Bruce