How to specific multiple dtypes in numpy.ndarray?
Thomas Jollans
tjol at tjol.eu
Thu Dec 19 06:39:18 EST 2019
On 19/12/2019 11.52, lampahome wrote:
> I meet performance is low when I use struct.unpack to unpack binary data.
>
> So I tried to use numpy.ndarray
> But meet error when I want to unpack multiple dtypes
>
> Can anyone teach me~
>
> Code like below:
> # python3
> import struct
> import numpy as np
> s1 = struct.Struct("@QIQ")
> ss1 = s1.pack(1,11,111)
> np.ndarray((3,), [('Q','I','Q')], ss1)
> # ValueError: mismatch in size of old and new data-descriptor.
A numpy array always has ONE dtype for ALL array elements.
If you read an array of structs, you can define a structured type, where
each element of your struct must have a name.
The error you're seeing is (as you know) because you're not setting up
your dtype in the right way. Let's fix it:
> In [2]: np.dtype([('Q', 'I',
> 'Q')])
>
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call
> last)
> <ipython-input-2-cecc70c78408> in <module>
> ----> 1 np.dtype([('Q', 'I', 'Q')])
>
> ValueError: mismatch in size of old and new data-descriptor
>
> In [3]: np.dtype([('field1', 'Q'), ('field2', 'I'), ('field3',
> 'Q')])
> Out[3]: dtype([('field1', '<u8'), ('field2', '<u4'), ('field3', '<u8')])
>
> In [4]:
>
>
... and now let's put it all together!
s1 = struct.Struct("@QIQ")
ss1 = s1.pack(1,11,111)
struct_dtype = np.dtype([('field1', 'Q'), ('field2', 'I'), ('field3', 'Q')])
a = np.frombuffer(ss1, dtype=struct_dtype)
I'm using the frombuffer() function deliberately so I don't have to
figure out the shape of the final array (which is (1,), not (3,), by the
way).
And hey presto: it raises an exception!
> ValueError: buffer size must be a multiple of element size
Your example shows a difference between the default behaviour of numpy's
structured dtype and the struct module: packing! By default, numpy
structured dtypes are closely packed, i.e. nothing is aligned to useful
memory boundaries.
struct_type.itemsize == 20
The struct module, on the other hand, tries to guess where the C
compiler would put its padding.
len(ss1) == 24
We can tell numpy to do the same:
struct_dtype = np.dtype([('field1', 'Q'), ('field2', 'I'), ('field3',
'Q')], align=True)
and then
a = np.frombuffer(ss1, dtype=struct_dtype)
works and produces
array([(1, 11, 111)],
dtype={'names':['field1','field2','field3'],
'formats':['<u8','<u4','<u8'], 'offsets':[0,8,16], 'itemsize':24,
'aligned':True})
with a .shape of (1,)
#####
It's worth noting that in your example, all three fields are aligned to
8 bytes, meaning that on a little-endian machine, you could quite simply
have interpreted the data as an array of uint64's instead:
In [30]: np.frombuffer(ss1,
dtype='u8')
Out[30]: array([ 1, 11, 111], dtype=uint64)
-- Thomas
More information about the Python-list
mailing list