[Numpy-discussion] Using ndarray for 2-dimensional, heterogeneous data

Francesc Altet faltet at carabos.com
Thu Feb 9 23:40:03 EST 2006

El dv 10 de 02 del 2006 a les 06:11 +0000, en/na N. Volbers va escriure:
> N. Volbers wrote:
> Sorry, I meant of course
>    b = numpy.array( [(1,2.3), (2, 17.2), (3, 19.1), (4, 22.2)], 
> dtype={'names':['col1','col2'], 'formats': ['i2','f4']})
> > Row operations are much easier now, because I can use numpy's 
> > intrinsic capabilities. However column operations require to create a 
> > new array based on the old one.

Yes, but this should be a pretty fast operation, as not data copy is
implied in doing b['col1'], for example.

> > Now I am wondering if the use of such an array has more drawbacks that 
> > I am not aware of. E.g. is it possible to mask values in such an array?

I'm not familiar with masked arrays, but my understanding is that such
column arrays are the same than regular arrays, so I'd say yes.

> > And is it slower to get a certain column by using b['col1'] than it 
> > would using a homogeneous array c and the notation c[:,0]?

Well, you should do some benchmarks, but I'd be surprised if there is a
big speed difference.

> > Does anyone else use such a data layout and can report on problems 
> > with it?

I use column data *a lot* in numarray and had not problems with this.
With NumPy things should be similar in terms of stability.

> The mathematical operations I want to use will be limited to operations 
> acting on the column e.g. creating a new column = b['col1'] + b['col2'] 
> and such. So of course I am aware of the basic difference that slicing 
> works different if I have an heterogeneous array due to the fact that 
> each row is considered a single item.

Exactly, these array columns are the same than regular homogeneous
arrays. The only difference is that there is a 'hole' between elements.
However, this is handled internally by NumPy through the use of the

My 2 cents,

>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data

More information about the NumPy-Discussion mailing list