[Numpy-discussion] Rec array: numpy.rec vs numpy.array with complex dtype
Pierre GM
pgmdevlist at gmail.com
Fri Jun 26 15:16:51 EDT 2009
On Jun 26, 2009, at 2:51 PM, Dan Yamins wrote:
>
> We've been using the numpy.rec classes to make record array objects.
>
> We've noticed that in more recent versions of numpy, record-array
> like objects can be made directly with the numpy.ndarray class, by
> passing a complex data type.
Hasn't it always been the case?
> However, it looks like the numpy.rec class is still supported.
>
> So, we have a couple of questions:
>
> 1) Which is the preferred way to make a record array, numpy.rec, or
> the numpy.ndarray with complex data type? A somewhat detailed
> explanation of the comparative properties would be great. (We know
> it's buried somewhere in the document ... sorry for being lazy!)
Short answer:
a np.recarray is a subclass of ndarray with structured dtype, where
fields can be accessed has attributes (as in 'yourarray.yourfield')
instead of as items (as in yourarray['yourfield']).
Under the hood, that means that the __getattribute__ method (and the
corresponding __setattr__) had to be overloaded (you need to check
whether an attribute is a field or not), which slows things down
compared to a standard ndarray.
My favorite way to get a np.recarray is to define a standard ndarray
w/ complex dtype, and then take a view as a recarray
Example:
>>> np.array([(1,10),(2,20)],dtype=[('a',int),
('b',int)]).view(np.recarray)
>
> 2) The individual "records" in numpy.rec array have the
> "numpy.record" type. The individual records in the numpy.array
> approach have "numpy.void" type. Can you tell us a little about
> how these differ, and what the advantages of one vs the other is?
Mmh:
>>> x = np.array([(1,10),(2,20)],dtype=[('a',int),('b',int)])
>>> rx = x.view(np.recarray)
>>> type(x[0])
<type 'numpy.void'>
>>> type(rx[0])
<type 'numpy.void'>
What numpy version are you using ?
> 3) We've heard talk about "complex data types" in numpy in general.
> Is there some good place we can read about this more extensively?
I think the proper term is 'structured data type', or 'structured
array'.
>
>
> Also: one thing we use and like about the numpy.rec constructors is
> that they can take a "names" argument, and the constructor function
> does some inferring about what the formats you want are, e.g.:
>
> img = numpy.rec.fromrecords([(0,1,'a'),(2,0,'b')], names =
> ['A','B','C'])
>
> produces:
>
> rec.array([(0, 1, 'a'), (2, 0, 'b')], dtype=[('A', '<i4'),
> ('B', '<i4'), ('C', '|S1')])
>
> This is very convenient.
>
> My immediate guess for the equivalent thing with the numpy.ndarray
> approach:
>
> img = numpy.array([(0,1,'a'),(2,0,'b')], names = ['A','B','C'])
>
> does not work. Is there some syntax for doing this?
You have to construct your dtype explicitly, as in "dtype=[('A',
'<i4'), ('B', '<i4'), ('C', '|S1')]".
np.rec.fromrecords processes the array and try to guess the best type
for each field, but it's slow and not always correct (what if your
third field should have been '|S3' ?)
More information about the NumPy-Discussion
mailing list