[Numpy-discussion] how to pipe into numpy arrays?
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Thu Oct 25 02:17:42 EDT 2012
On 10/24/2012 09:00 PM, Michael Aye wrote:
> As numpy.fromfile seems to require full file object functionalities
> like seek, I can not use it with the sys.stdin pipe.
> So how could I stream a binary pipe directly into numpy?
> I can imagine storing the data in a string and use StringIO but the
> files are 3.6 GB large, just the binary, and that will most likely be
> much more as a string object.
A Python 2 string is just a bytes object and would take 3.6 GB as well
(or did you mean in text encoding?)
> Reading binary files on disk is NOT the problem, I would like to avoid
> the temporary file if possible.
Read in chunks? Something like
1) Create array arr
arr_bytes = arr.view(np.uint8).reshape(np.prod(arr.shape))
# check that modifying arr_bytes modifies arr,
# if not, work with reshape arguments
while not done:
arr_bytes[i:i + chunk_size] = f.read(chunk_size)
Alternatively, one could write some C or Cython code to read directly
into the NumPy array buffer, which avoids an extra copy over the memory
bus of the data. (Since unfortunately it doesn't look like "fromfile"
has an out argument.)
More information about the NumPy-Discussion