shoehorn c-structured data into Numpy

Sun Jun 14 13:17:17 EDT 2009

Helmut Fritz wrote:
> 
> Hello there everyone, I used to be on this a long time ago but then I 
> got so much spam I gave up.
> 
> But this strategy has come a little unstuck.  I have binary output from 
> a Fortran program that is in a big-endian C-structured binary file.  The 
> output can be very variable and many options create different orderings 
> in the binary file. So I'd like to keep the header-reading in python.
> 
> Anyhoo, I've so far been able to read the output with the struct 
> module.  But my question is how do I create numpy arrays from the bits 
> of the file I want?
> 
> So far I've been able to scan through to the relevant sections and I've 
> tried all maner of idiotic combinations...
> 
> The floats are 4 bytes for sinngle percision, and it's a unstructured 
> grid from a finite difference scheme so I know the number of cells 
> (ncells) for the property I am looking to extract.
> 
> So I've tried:
> TC1 = np.frombuffer(struct.unpack(">%df" % ncells, 
> data.read(4*ncells))[0], dtype=float)
> Only to get a very logical:
>  >>> Traceback (most recent call last):
>  >>>   File "a2o.py", line 466, in <module>
>  >>>     runme(me)
>  >>>   File "a2o.py", line 438, in runme
>  >>>     me.spmapdat(data)
>  >>>   File "a2o.py", line 239, in spmapdat
>  >>>     TC1 = np.frombuffer(struct.unpack(">%df" % ncells, 
> data.read(4*ncells))[0], dtype=float)
>  >>> AttributeError: 'float' object has no attribute '__buffer__'
> 
This:

     struct.unpack(">%df" % ncells, data.read(4*ncells))

unpacks to a tuple of floats, from which you get the first (actually
the zeroth) float. You probably didn't want to do that! :-)

Try:

    TC1 = np.array(struct.unpack(">%df" % ncells, data.read(4 * 
ncells)), dtype=float)

> ok... so I'll feed frombuffer my data file...
> 
> And then tried:
> TC1 = np.frombuffer(data.read(4*ncells), dtype=float, count=ncells)
>  >>> Traceback (most recent call last):
>  >>>   File "a2o.py", line 466, in <module>
>  >>>     runme(me)
>  >>>   File "a2o.py", line 438, in runme
>  >>>     me.spmapdat(data)
>  >>>   File "a2o.py", line 240, in spmapdat
>  >>>     TC1 = np.frombuffer(data.read(4*ncells), dtype=float, count=ncells)
>  >>> ValueError: buffer is smaller than requested size
> 
> And THEN I tried:
> TC1 = np.frombuffer(data.read(4*ncells), dtype=float, count=4*ncells)
>  >>> Traceback (most recent call last):
>  >>>   File "a2o.py", line 466, in <module>
>  >>>     runme(me)
>  >>>   File "a2o.py", line 438, in runme
>  >>>     me.spmapdat(data)
>  >>>   File "a2o.py", line 240, in spmapdat
>  >>>     TC1 = np.frombuffer(data.read(4*ncells), dtype=float, 
> count=4*ncells)
>  >>> ValueError: buffer is smaller than requested size
> 
> But it's the right size - honest.
> 
> (In general) I should be able to put these arrays into memory with no 
> problems.  Certainly given the rate at which I'm turning around this 
> code... Memory may be in the terabytes once I'm done.
> 
> Anyone got a Sesame Street answer for this?
> 
> Many thanks!  Helmut.
>