[Numpy-discussion] how to pipe into numpy arrays?

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Thu Oct 25 02:17:42 EDT 2012


On 10/24/2012 09:00 PM, Michael Aye wrote:
> As numpy.fromfile seems to require full file object functionalities
> like seek, I can not use it with the sys.stdin pipe.
> So how could I stream a binary pipe directly into numpy?
> I can imagine storing the data in a string and use StringIO but the
> files are 3.6 GB large, just the binary, and that will most likely be
> much more as a string object.

A Python 2 string is just a bytes object and would take 3.6 GB as well 
(or did you mean in text encoding?)

> Reading binary files on disk is NOT the problem, I would like to avoid
> the temporary file if possible.

Read in chunks? Something like

1) Create array arr

2)

arr_bytes = arr.view(np.uint8).reshape(np.prod(arr.shape))
# check that modifying arr_bytes modifies arr,
# if not, work with reshape arguments

3)

while not done:
     arr_bytes[i:i + chunk_size] = f.read(chunk_size)
     ...

Alternatively, one could write some C or Cython code to read directly 
into the NumPy array buffer, which avoids an extra copy over the memory 
bus of the data. (Since unfortunately it doesn't look like "fromfile" 
has an out argument.)

Dag Sverre



More information about the NumPy-Discussion mailing list