[Numpy-discussion] Efficient reading of binary data

Robert Kern robert.kern at gmail.com
Thu Apr 3 20:00:34 EDT 2008


On Thu, Apr 3, 2008 at 6:53 PM, Nicolas Bigaouette
<nbigaouette at gmail.com> wrote:
> Thanx for the fast response Robert ;)
>
> I changed my code to use the slice:
>  E = data[6::9]It is indeed faster and less eat less memory. Great.
>
> Thanx for the endiannes! I knew there was something like this ;) I suspect
> that, in '>f8', "f" means float and "8" means 8 bytes?

Yes, and the '>' means big-endian. '<' is little-endian, and '=' is
native-endian.

> From some benchmarks, I see that the slowest thing is disk access. It can
> slow the displaying of data from around 1sec (when data is in os cache or
> buffer) to 8sec.
>
> So the next step would be to only read the needed data from the binary
> file... Is it possible to read from a file with a slice? So instead of:
>
> data = numpy.fromfile(file=f, dtype=float_dtype, count=9*Stot)
> E = data[6::9]
> maybe something like:
> E = numpy.fromfile(file=f, dtype=float_dtype, count=9*Stot, slice=6::9)

Instead of reading using fromfile(), you can try memory-mapping the array.

  from numpy import memmap
  E = memmap(f, dtype=float_dtype, mode='r')[6::9]

That may or may not help. At least, it should decrease the latency
before you start pulling out frames.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco



More information about the NumPy-Discussion mailing list