[Numpy-discussion] Records in scipy core

Travis Oliphant oliphant at ee.byu.edu
Thu Dec 1 17:12:09 EST 2005

Christopher Hanley wrote:

>Hi Travis,
>About a year ago (summer 2004) on the numpy distribution list there was 
>a lot of discussion of the records interface.  I will dig through my 
>notes and put together a summary.
Thanks for the pointers.  I had forgotten about that discussion.   I 
went back and re-read the thread.

Here's a good link for others to re-read (the end of) this thread:


I think some very good points were made.  These points should be 
addressed from the context of scipy arrays which now support records in 
a very basic way.   Because of this, we can support nested records of 
records --- but how is this to be presented to the user is still an open 
question (i.e. how do you build one...)

I've finally been converted to believe that the notion of records is 
very important because it speaks of how to do the basic (typeless, 
mathless) array object that will go into Python correctly  If we can get 
the general records type done right, then all the other types are 
examples of it.

Thus, I would like to revive discussion of the record object for 
inclusion in scipy core.  I pretty much agree with the semantics that 
Perry described in his final email (is this all implemented in numarray, 
yet?), except I would agree with Francesc Alted that a titles or labels 
concept should be allowed. 

I'm more enthusiastic about code than discussion, so I'm hoping for a 
short-lived discussion followed by actual code.  I'm ready to do the 
implementation this week (I've already borrowed lots of great code from 
numarray which makes it easier), but feel free to chime in even if you 
read this later.

In my mind, the discussion about the records array is primarily a 
discussion about the records data-type.  The way I'm thinking, the scipy 
ndarray is a homogeneous collection of the same "thing."  The big change 
in scipy core is that Numeric used to allow only certain data types, but 
now the ndarray can contain an arbitrary "void" data type.  You can also 
add data-types to scipy core.  These data-types are "almost" full 
members of the scipy data-type community.  The "almost" is because the 
N*N casting  matrix is not updated (this would require a re-design of 
how casting is considered).   At some point, I'd like to fix this wart 
and make it so that data-types can be added at will -- I think if we get 
the record type right, I'll be able to figure out how to do this.

We need to add a "record" data-type to scipy.  Then, any array can be of 
"record" type, and there will be an additional "array scalar" that is 
what is returned when selecting a single element from the array.   So, a 
record array would simply be an array of "records" plus some extra stuff 
for dealing with the mapping from field names to actual segments of the 
array element (we may decide that this mapping is general enough that 
all scipy arrays should have the capability of assigning names to 
sub-bytes of its main data-type and means of accessing those sub-bytes 
in which case the subclass is unnecessary). 

Let me explain further:  Right now, the machinery is in place in 
scipy_core to get and set in any ndarray (regardless of its data-type) 
an arbitrary "field".  A "field" in this context is defined as a 
sub-section of the basic element making up the array.   Generically the 
sub-section is defined by an offset and a data-type or a tuple of a data 
type and a shape (to allow sub-arrays in a record).    What I understand 
the user to want is the binding of a name to this generic sub-section 

1) Should we allow that for every scipy ndarray:  complex data types 
have an obvious binding, would anybody want to name the first two bytes 
of their int32 array?  I suggest holding off on this one until a records 
array is working....

2) Supposing we don't go with number 1, we need to design a record data 
type that has this name-binding capability.

The recarray class in scipy core SVN essentially just does this.

Question:  How important is backwards compatibility with old numarray 
specification.  In particular, I would go with the .fields access 
described by Perry, and eliminate the .field() approach?

Thanks for reading and any comments you can make.


More information about the NumPy-Discussion mailing list