How to specific multiple dtypes in numpy.ndarray?

Thu Dec 19 06:39:18 EST 2019

On 19/12/2019 11.52, lampahome wrote:
> I meet performance is low when I use struct.unpack to unpack binary data.
>
> So I tried to use numpy.ndarray
> But meet error when I want to unpack multiple dtypes
>
> Can anyone teach me~
>
> Code like below:
> # python3
> import struct
> import numpy as np
> s1 = struct.Struct("@QIQ")
> ss1 = s1.pack(1,11,111)
> np.ndarray((3,), [('Q','I','Q')], ss1)
> # ValueError: mismatch in size of old and new data-descriptor.

A numpy array always has ONE dtype for ALL array elements.

If you read an array of structs, you can define a structured type, where
each element of your struct must have a name.

The error you're seeing is (as you know) because you're not setting up
your dtype in the right way. Let's fix it:

> In [2]: np.dtype([('Q', 'I',
> 'Q')])                                                                                                 
>
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call
> last)
> <ipython-input-2-cecc70c78408> in <module>
> ----> 1 np.dtype([('Q', 'I', 'Q')])
>
> ValueError: mismatch in size of old and new data-descriptor
>
> In [3]: np.dtype([('field1', 'Q'), ('field2', 'I'), ('field3',
> 'Q')])                                                               
> Out[3]: dtype([('field1', '<u8'), ('field2', '<u4'), ('field3', '<u8')])
>
> In [4]:   
>
>

... and now let's put it all together!

s1 = struct.Struct("@QIQ")
ss1 = s1.pack(1,11,111)
struct_dtype = np.dtype([('field1', 'Q'), ('field2', 'I'), ('field3', 'Q')])
a = np.frombuffer(ss1, dtype=struct_dtype)

I'm using the frombuffer() function deliberately so I don't have to
figure out the shape of the final array (which is (1,), not (3,), by the
way).

And hey presto: it raises an exception!

> ValueError: buffer size must be a multiple of element size

Your example shows a difference between the default behaviour of numpy's
structured dtype and the struct module: packing! By default, numpy
structured dtypes are closely packed, i.e. nothing is aligned to useful
memory boundaries.

struct_type.itemsize == 20

The struct module, on the other hand, tries to guess where the C
compiler would put its padding.

len(ss1) == 24

We can tell numpy to do the same:

struct_dtype = np.dtype([('field1', 'Q'), ('field2', 'I'), ('field3',
'Q')], align=True)

and then

a = np.frombuffer(ss1, dtype=struct_dtype)

works and produces

array([(1, 11, 111)],
      dtype={'names':['field1','field2','field3'],
'formats':['<u8','<u4','<u8'], 'offsets':[0,8,16], 'itemsize':24,
'aligned':True})

with a .shape of (1,)

#####

It's worth noting that in your example, all three fields are aligned to
8 bytes, meaning that on a little-endian machine, you could quite simply
have interpreted the data as an array of uint64's instead:

In [30]: np.frombuffer(ss1,
dtype='u8')                                                                                             

Out[30]: array([  1,  11, 111], dtype=uint64)

-- Thomas