[Numpy-discussion] Rec array: numpy.rec vs numpy.array with complex dtype

Fri Jun 26 15:16:51 EDT 2009

On Jun 26, 2009, at 2:51 PM, Dan Yamins wrote:
>
> We've been using the numpy.rec classes to make record array objects.
>
> We've noticed that in more recent versions of numpy, record-array  
> like objects can be made directly with the numpy.ndarray class, by  
> passing a complex data type.

Hasn't it always been the case?

> However, it looks like the numpy.rec class is still supported.
>
> So, we have a couple of questions:
>
> 1) Which is the preferred way to make a record array, numpy.rec, or  
> the numpy.ndarray with complex data type?   A somewhat detailed  
> explanation of the comparative properties would be great.  (We know  
> it's buried somewhere in the document ... sorry for being lazy!)

Short answer:
a np.recarray is a subclass of ndarray with structured dtype, where  
fields can be accessed has attributes (as in 'yourarray.yourfield')  
instead of as items (as in yourarray['yourfield']).
Under the hood, that means that the __getattribute__ method (and the  
corresponding __setattr__) had to be overloaded (you need to check  
whether an attribute is a field or not), which slows things down  
compared to a standard ndarray.

My favorite way to get a np.recarray is to define a standard ndarray  
w/ complex dtype, and then take a view as a recarray
Example:
 >>>  np.array([(1,10),(2,20)],dtype=[('a',int), 
('b',int)]).view(np.recarray)

>
> 2) The individual "records" in numpy.rec array have the  
> "numpy.record" type.   The individual records in the numpy.array  
> approach have "numpy.void" type.   Can you tell us a little about  
> how these differ, and what the advantages of one vs the other is?

Mmh:
 >>>  x = np.array([(1,10),(2,20)],dtype=[('a',int),('b',int)])
 >>> rx = x.view(np.recarray)
 >>> type(x[0])
  <type 'numpy.void'>
 >>> type(rx[0])
<type 'numpy.void'>

What numpy version are you using ?

> 3) We've heard talk about "complex data types" in numpy in general.   
> Is there some good place we can read about this more extensively?

I think the proper term is 'structured data type', or 'structured  
array'.

>
>
> Also: one thing we use and like about the numpy.rec constructors is  
> that they can take a "names" argument, and the constructor function  
> does some inferring about what the formats you want are, e.g.:
>
>        img = numpy.rec.fromrecords([(0,1,'a'),(2,0,'b')], names =  
> ['A','B','C'])
>
> produces:
>
>       rec.array([(0, 1, 'a'), (2, 0, 'b')], dtype=[('A', '<i4'),  
> ('B', '<i4'), ('C', '|S1')])
>
> This is very convenient.
>
> My immediate guess for the equivalent thing with the numpy.ndarray  
> approach:
>
>       img = numpy.array([(0,1,'a'),(2,0,'b')], names = ['A','B','C'])
>
> does not work.   Is there some syntax for doing this?

You have to construct your dtype explicitly, as in "dtype=[('A',  
'<i4'), ('B', '<i4'), ('C', '|S1')]".
np.rec.fromrecords processes the array and try to guess the best type  
for each field, but it's slow and not always correct (what if your  
third field should have been '|S3' ?)