[Numpy-discussion] Proposed record array behavior: the rest of the story: updated

Mon Jul 26 08:44:06 EDT 2004

I'll try to see if I can address all the comments raised (please let me know
if I missed something).

1) Russell Owen asked that indexing by field name not be permitted for
record arrays and at least one other agreed. Since it is easier to add
something like this later rather than take it away, I'll go along with that.
So while it will be possible to index a Record by field name, it won't be
for record arrays.

2) Russell asked if it would be possible to specify the types of the fields
using numarray/chararray type objects. Yes, it will. We will adopt Rick
White's 2nd suggestion for handling fields that themselves are arrays, I.e.,

formats = (3,Int16), ((4,5), Float32)

For a 1-d Int16 cell of shape (3,) and a 2-d Float32 cell of shape (4,5)

The first suggestion ("formats = 3*(Int16,), 4*(5*(Float32,),)") will not be
supported. While it is very suggestive, it does allow for inconsistent
nestings that must be checked and rejected (what if someone supplies
(Int16, Int16, Float32) as one of the fields?) which complicates the code.
It doesn't read as well.

3) Russell also suggested nesting record arrays. This sort of capability is
not being ruled out, but there isn't a chance we can devote resources to
this any time soon (can anyone else?)

4) To address the suggestions of Russell and Francesc, I'm proposing that
the current "field" method now become an object (callable to retain backward
compatibility) that supports:
   a) indexing by name or number (just like Records)
   b) name to attribute mapping (with restrictions).
So that this means 3 ways to do things! As far as attribute access goes, I
simply do not want to throw arbitrary attributes into the main object
itself. The use of field is comparatively clean since it has not other
public attributes. Aside from mapping '_' into spaces, no other illegal
attribute characters will be mapped. (The identifier/label suggestion by
Colin Williams has some merit, but on the whole, I think it brings more
baggage than benefit). The mapping algorithm is such that it tries to map
the attribute to any field name that has either a ' ' or '_' in the place of
'_' in the attribute name. While all '_' in the name will take precedence
over any other match, there will be no guaranteed order for other cases
(e.g., 'x_y z' vs 'x y_z' vs 'x y z'; though 'x_y_z' would be guaranteed to
be selected for field.x_y_z if present)

Note that the only real need to support indexing other than consistency is
to support slices. Only slices for numerical indexing will be supported (and
not initially). The callable syntax can support index arrays just as easily.

To summarize

Rarr.field.home_address
Rarr.field['home address']
Rarr.field('home address')

Will all work for a field named "home address"

************************************************

Any comments on these changes to the proposal? Are there those that are
opposed to supporting attribute access?

Thanks, Perry