[Numpy-discussion] reading big-endian uint16 into array on little-endian machine

Fri Jun 18 10:39:57 EDT 2010

On Fri, Jun 18, 2010 at 6:15 AM, Sturla Molden <sturla at molden.no> wrote:

> Den 17.06.2010 16:29, skrev greg whittier:
> > I have files (from an external source) that contain ~10 GB of
> > big-endian uint16's that I need to read into a series of arrays.  What
> > I'm doing now is
> >
> > import numpy as np
> > import struct
> >
> > fd = open('file.raw', 'rb')
> >
> > for n in range(10000)
> >      count = 1024*1024
> >      a = np.array([struct.unpack('>H', fd.read(2)) for i in
> range(count)])
> >      # do something with a
> >
> > It doesn't seem very efficient to call struct.unpack one element at a
> > time, but struct doesn't have an unpack_farray version like xdrlib
> > does.  I also thought of using the array module and .byteswap() but
> > the help says it only work on 4 and 8 byte arrays.
> >
> > Any ideas?
> >
> >
>
> t's just a matter of swapping the bytes:
>
> arr = 1D array of uint16
> bytes = arr.view(dtype=np.uint8)
> tmp = bytes[::2].copy()
> bytes[::2] = bytes[1::2]
> bytes[1::2] = tmp
>
>
> Or like this:
>
> arr = 1D array of uint16
> arr = (arr >> 8) | (arr << 8)
>
>
> The latter generates three temporary arrays, the first generates one.
>
> You can avoid this with C:
>
> __declspec(dllexport)
> void byteswap(unsigned short *arr, int n)
> {
>     for (int i=0; i<n; i++) {
>         arr[i] = (arr[i] >> 8) | (arr[i] << 8);
>     }
> }
>
>
>
> Sturla
>
>
Just for my own knowledge, would Robert's suggestion of using '>i2' as the
dtype be considered the "best" solution, mostly because of its simplicity,
but also because it does not assume the endian-ness of the host computer?

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100618/160ff4d8/attachment.html>