[Numpy-discussion] another view puzzle

Wed Jun 3 18:53:55 EDT 2009

On Wed, Jun 3, 2009 at 5:57 PM, Christopher Barker
<Chris.Barker at noaa.gov> wrote:
> josef.pktd at gmail.com wrote:
>> I'm very happy with plain numpy arrays, but to handle different data
>> types in scipy.stats, I'm still trying to figure out how views and
>> structured arrays work. And I'm still confused.
>
> OK, I'd stay away from matrix then, no need to add that confusion
>
>>>From the use for data handling in for example matplotlib and the
>> recarray functions, I thought of structured arrays (and recarrays) as
>> columns of data. Instead the analogy to database records and (1d)
>> arrays of structs as in matlab might be better.
>
> they are a bit of a mixture -- I think the record style access:
>
> arr['x']
>
> means that there is no "rows" or "columns", just data accessed by name.
>
>> The numpy help and documentation is not exactly rich in examples how
>> to do convert structured arrays to something that can be used for
>> calculation, except for dictionary access and row iteration. And using
>> views to access them is not as foolproof as I thought.
>
> views are kind of a low-level trick -- what views do is let you make
> more than one numpy array that share the same memory data block. Doing
> this required a bit of knowledge about how data is stored in memory.
>
> For the common use case, what I do is use struct arrays to store and
> mass data around, and simple pull out the data into a regular array to
> manipulate it:
>
> In [45]: x
> Out[45]:
> array([(0.0, 1.0, 2.0, 12.0, 4.0), (1.0, 2.0, 3.0, 45.0, 5.0)],
>       dtype=[('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<f4'),
> ('e', '<f4')])
>
> In [46]:
>
> In [47]: e = x['e']
>
> In [48]: e
> Out[48]: array([ 4.,  5.], dtype=float32)
>
> note that this is still a "view" into the original array:
>
> In [49]: e *= 5
>
> In [50]: x
> Out[50]:
> array([(0.0, 1.0, 2.0, 12.0, 20.0), (1.0, 2.0, 3.0, 45.0, 25.0)],
>       dtype=[('a', '<f4'), ('b', '<f4'), ('c', '<f4'), ('d', '<f4'),
> ('e', '<f4')])
>
> #( see how the e field changed )
>
> This is interesting:
> In [51]: x[0]
> Out[51]: (0.0, 1.0, 2.0, 12.0, 20.0)
>
> In [52]: type(x[0])
> Out[52]: <type 'numpy.void'>
>
> What's a numpy.void type? I thought this would be a tuple, or a numpy
> scalar of that dtype. It can be indexed either way, though:
>
> In [70]: x[0][2]
> Out[70]: 2.0
>
> In [72]: x[0]['c']
> Out[72]: 2.0
>
> cool.
>

The access of rows, one column or individual entries looks good, but
to get slices out of the structured array takes more effort, and it
took me more time to figure this out:

>>> z
array([[(0.0, 1.0, 2.0, 3.0, 4.0)],
       [(1.0, 2.0, 3.0, 4.0, 5.0)]],
      dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8'), ('d', '<f8'),
('e', '<f8')])

view works for identical dtypes (I always forget that reshape is required):

>>> z.view(float).reshape(-1,len(z.dtype))
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 1.,  2.,  3.,  4.,  5.]])
>>> z.view(float).reshape(-1,len(z.dtype)).mean(1)
array([ 2.,  3.])
>>> z.view(float).reshape(-1,len(z.dtype))[:,2:4].sum(0)
array([ 5.,  7.])

if not all items in structured array have the same dtype, I didn't
find anything better than

>>> np.hstack([z[i] for i in z.dtype.names[2:4]])
array([[ 2.,  3.],
       [ 3.,  4.]])

>>> np.dot(np.hstack([z[i] for i in z.dtype.names[2:4]]), np.ones(len(z)))
array([ 5.,  7.])

Is len(z.dtype) > 0   the best way to find out whether an array has a
structured dtype?

Josef