[Numpy-discussion] Records in scipy core

Travis Oliphant oliphant.travis at ieee.org
Fri Dec 2 10:37:03 EST 2005


Perry Greenfield wrote:

>  
>
>For us, probably not critical since we have to do some rewriting anyway.
>(But it would be nice to retain for a while as deprecated).
>  
>
Easy enough to do by defining an actual record array (however, see 
below).   I've been retaining backwards compatibility in other ways 
while not documenting it.  For example, you can actually now pass in 
strings like 'Int32' for types.

>But what about field names that don't map well to attributes?
>I haven't had a chance to reread the past emails but I seem to
>recall this was a significant issue. That would imply that .field()
>would be needed for those cases anyway.
>  
>
What I'm referring to as the solution here is a slight modification to 
what Perry described.  In other words, all arrays have the attribute

.fields

You can set this attribute to a dictionary which will automagically 
gives field names to any array (this dictionary has ordered lists of 
'names', (optionally) 'titles', and "(data-descr, [offset])" lists which 
defines the mapping.  If offset is not given, then the "next-available" 
offset is assumed.  The data-descr is either 1) a data-type or 2) a 
tuple of (data-type, shape).   The data-type is either a defined 
data-type or alias, or an object with a .fields attribute that provides 
the same dictionary and an .itemsize attribute that computes the total 
size of the data-type.


You can get this attribute which returns a special fields object 
(written in Python initially like the flags attribute) that can look up 
field names like a dictionary, or with attribute access for names that 
are either 1) acceptable or 2) have a user-provided "python-name" 
associated with them.  

Thus,

.fields['home address']

would always work

but

.fields.hmaddr

would only work if the user had previously made the association hmaddr 
-> 'home address' for the data type of this array.   Thus 'home address' 
would be a title but hmaddr would be the name.

The records module would simply provide functions for making record 
arrays and a record data type. 

Driving my thinking is the concept that the notion of a record array is 
really a description of the data type of the array (not the array 
itself).  Thus, all the fields information should really just be part of 
the data type itself.  Now, I don't really want to create and register a 
new data type every time somebody has a new record layout.

So, I've been re-thinking the notion of "registering a data-type".  It 
seems to me that while it's O.K. to have a set of pre-defined data 
types.  The notion of data-type ought to be flexible enough to allow the 
user to define one "on-the-fly". 

I'm thinking of ways to do this right now.  Any suggestions are welcome.


-Travis



 





More information about the NumPy-Discussion mailing list