[Numpy-discussion] Recarray attributes writeable

Sat Jun 17 09:40:20 EDT 2006

This reply sent 9:36 AM, Jun 17 (because it may not show up
for a day or so from my gmail account, if it shows up at all)

On 6/17/06, Francesc Altet <faltet at carabos.com> wrote:
> El dv 16 de 06 del 2006 a les 14:46 -0700, en/na Andrew Straw va
> escriure:
> > Erin Sheldon wrote:
> >
> > >Anyway - Recarrays have convenience attributes such that
> > >fields may be accessed through "." in additioin to
> > >the "field()" method.  These attributes are designed for
> > >read only; one cannot alter the data through them.
> > >Yet they are writeable:
> > >
> > >
> > >
> > >>>>tr=numpy.recarray(10, formats='i4,f8,f8', names='id,ra,dec')
> > >>>>tr.field('ra')[:] = 0.0
> > >>>>tr.ra
> > >>>>
> > >>>>
> > >array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
> > >
> > >
> > >
> > >>>>tr.ra = 3
> > >>>>tr.ra
> > >>>>
> > >>>>
> > >3
> > >
> > >
> > >>>>tr.field('ra')
> > >>>>
> > >>>>
> > >array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
> > >
> > >I feel this should raise an exception, just as with trying to write
> > >to the "size" attribute. Any thoughts?
> > >
> > >
> > I have not used recarrays much, so take this with the appropriate
> > measure of salt.
> >
> > I'd prefer to drop the entire pseudo-attribute thing completely before
> > it gets entrenched. (Perhaps it's too late.)
> >
>

I think that initially I would concur to drop them.  I am new to numpy,
however, so they are not entrenched for me.  Anyway, see below.

> However, I think that this has its utility, specially when accessing to
> nested fields (see later). In addition, I'd suggest introducing a
> special accessor called, say, 'fields' in order to access the fields
> themselves and not the attributes. For example, if you want to access
> the 'strides' attribute, you can do it in the usual way:
>
> >>> import numpy
> >>> tr=numpy.recarray(10, formats='i4,f8,f8', names='id,ra,strides')
> >>> tr.strides
> (20,)
>
> but, if you want to access *field* 'strides' you could do it by issuing:
>
> >>> tr.fields.strides
> <repr of field accessor object (shape, type...)>
> >>> tr.fields.strides[:]
> array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
>
> We have several advantages in adopting the previous approach:
>
> 1. You don't mix (nor pollute) the namespaces for attributes and fields.
>
> 2. You have a clear idea when you are accessing a variable or a field.
>
> 3. Accessing nested columns would still be very easy:
> tr.field('nested1').field('nested2').field('nested3') vs
> tr.fields.nested1.nested2.nested3
>
> 4. You can also define a proper __getitem__ for accessing fields:
> tr.fields['nested1']['nested2']['nested3'].
> In the same way, elements of 'nested2' field could be accessed by:
> tr.fields['nested1']['nested2'][2:10:2].
>
> 5. Finally, you can even prevent setting or deleting columns by
> disabling the __setattr__ and __delattr__.

This is interesting, and I would add a 6th to this:

6. The .fields by itself could return the names of the fields, which are
currently not accessible in any simple way.  I always think that these
should be methods (.fields(),.size(), etc) but if we are going down
the attribute route, this might be a simple fix.

>
> PyTables has adopted a similar schema for accessing nested columns,
> except for 4, where we decided not to accept both strings and slices for
> the __getitem__() method (you know the mantra: "there should preferably
> be just one way of doing things", although maybe we've been a bit too
> much strict in this case), and I think it works reasonably well. In any
> case, the idea is to decouple the attributes and fields so that they
> doesn't get mixed.

Strings or fieldnum access greatly improves the scriptability, but this
can always be done through the .field() access.

Erin