[Numpy-discussion] how to pipe into numpy arrays?

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Thu Oct 25 02:19:30 EDT 2012


On 10/25/2012 08:17 AM, Dag Sverre Seljebotn wrote:
> On 10/24/2012 09:00 PM, Michael Aye wrote:
>> As numpy.fromfile seems to require full file object functionalities
>> like seek, I can not use it with the sys.stdin pipe.
>> So how could I stream a binary pipe directly into numpy?
>> I can imagine storing the data in a string and use StringIO but the
>> files are 3.6 GB large, just the binary, and that will most likely be
>> much more as a string object.
>
> A Python 2 string is just a bytes object and would take 3.6 GB as well
> (or did you mean in text encoding?)
>
>> Reading binary files on disk is NOT the problem, I would like to avoid
>> the temporary file if possible.
>
> Read in chunks? Something like
>
> 1) Create array arr
>
> 2)
>
> arr_bytes = arr.view(np.uint8).reshape(np.prod(arr.shape))
> # check that modifying arr_bytes modifies arr,
> # if not, work with reshape arguments
>
> 3)
>
> while not done:
>      arr_bytes[i:i + chunk_size] = f.read(chunk_size)
>      ...
>
> Alternatively, one could write some C or Cython code to read directly
> into the NumPy array buffer, which avoids an extra copy over the memory
> bus of the data. (Since unfortunately it doesn't look like "fromfile"
> has an out argument.)

Actually, as long as you make sure chunk_size is on the order of 1 MB or 
so, the Python overhead may not matter and the chunks fit in cache so an 
extra copy is avoided, so a C solution may be overkill.

Dag Sverre



More information about the NumPy-Discussion mailing list