[Numpy-discussion] Recarray attributes writeable

Sat Jun 17 04:17:28 EDT 2006

El dv 16 de 06 del 2006 a les 14:46 -0700, en/na Andrew Straw va
escriure:
> Erin Sheldon wrote:
> 
> >Anyway - Recarrays have convenience attributes such that
> >fields may be accessed through "." in additioin to
> >the "field()" method.  These attributes are designed for
> >read only; one cannot alter the data through them.
> >Yet they are writeable:
> >
> >  
> >
> >>>>tr=numpy.recarray(10, formats='i4,f8,f8', names='id,ra,dec')
> >>>>tr.field('ra')[:] = 0.0
> >>>>tr.ra
> >>>>        
> >>>>
> >array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
> >
> >  
> >
> >>>>tr.ra = 3
> >>>>tr.ra
> >>>>        
> >>>>
> >3
> >  
> >
> >>>>tr.field('ra')
> >>>>        
> >>>>
> >array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
> >
> >I feel this should raise an exception, just as with trying to write
> >to the "size" attribute. Any thoughts?
> >  
> >
> I have not used recarrays much, so take this with the appropriate 
> measure of salt.
> 
> I'd prefer to drop the entire pseudo-attribute thing completely before 
> it gets entrenched. (Perhaps it's too late.)
> 

However, I think that this has its utility, specially when accessing to
nested fields (see later). In addition, I'd suggest introducing a
special accessor called, say, 'fields' in order to access the fields
themselves and not the attributes. For example, if you want to access
the 'strides' attribute, you can do it in the usual way:

>>> import numpy
>>> tr=numpy.recarray(10, formats='i4,f8,f8', names='id,ra,strides')
>>> tr.strides
(20,)

but, if you want to access *field* 'strides' you could do it by issuing:

>>> tr.fields.strides
<repr of field accessor object (shape, type...)>
>>> tr.fields.strides[:]
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

We have several advantages in adopting the previous approach:

1. You don't mix (nor pollute) the namespaces for attributes and fields.

2. You have a clear idea when you are accessing a variable or a field.

3. Accessing nested columns would still be very easy:
tr.field('nested1').field('nested2').field('nested3') vs
tr.fields.nested1.nested2.nested3

4. You can also define a proper __getitem__ for accessing fields:
tr.fields['nested1']['nested2']['nested3'].
In the same way, elements of 'nested2' field could be accessed by:
tr.fields['nested1']['nested2'][2:10:2].

5. Finally, you can even prevent setting or deleting columns by
disabling the __setattr__ and __delattr__.

PyTables has adopted a similar schema for accessing nested columns,
except for 4, where we decided not to accept both strings and slices for
the __getitem__() method (you know the mantra: "there should preferably
be just one way of doing things", although maybe we've been a bit too
much strict in this case), and I think it works reasonably well. In any
case, the idea is to decouple the attributes and fields so that they
doesn't get mixed.

Implementing this shouldn't be complicated at all, but I'm afraid that I
can't do this right now :-(

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"