reading gzip compressed files using numpy.fromfile

Dear Numpy Mailing List Readers,
I have a quite simple problem, for what I did not find a solution for now. I have a gzipped file lying around that has some numbers stored in it and I want to read them into a numpy array as fast as possible but only a bunch of data at a time. So I would like to use numpys fromfile funtion.
For now I have somehow the following code :
f=gzip.open( "myfile.gz", "r" ) xyz=npy.fromfile(f,dtype="float32",count=400)
So I would read 400 entries from the file, keep it open, process my data, come back and read the next 400 entries. If I do this, numpy is complaining that the file handle f is not a normal file handle : OError: first argument must be an open file
but in fact it is a zlib file handle. But gzip gives access to the normal filehandle through f.fileobj.
So I tried xyz=npy.fromfile(f.fileobj,dtype="float32",count=400)
But there I get just meaningless values (not the actual data) and when I specify the sep=" " argument for npy.fromfile I get just .1 and nothing else.
Can you tell me why and how to fix this problem? I know that I could read everything to memory, but these files are rather big, so I simply have to avoid this.
Thanks in advance.

On Wed, Oct 28, 2009 at 14:31, Peter Schmidtke pschmidtke@mmb.pcb.ub.es wrote:
Dear Numpy Mailing List Readers,
I have a quite simple problem, for what I did not find a solution for now. I have a gzipped file lying around that has some numbers stored in it and I want to read them into a numpy array as fast as possible but only a bunch of data at a time. So I would like to use numpys fromfile funtion.
For now I have somehow the following code :
f=gzip.open( "myfile.gz", "r" ) xyz=npy.fromfile(f,dtype="float32",count=400)
So I would read 400 entries from the file, keep it open, process my data, come back and read the next 400 entries. If I do this, numpy is complaining that the file handle f is not a normal file handle : OError: first argument must be an open file
but in fact it is a zlib file handle. But gzip gives access to the normal filehandle through f.fileobj.
np.fromfile() requires a true file object, not just a file-like object. np.fromfile() works by grabbing the FILE* pointer underneath and using C system calls to read the data, not by calling the .read() method.
So I tried xyz=npy.fromfile(f.fileobj,dtype="float32",count=400)
But there I get just meaningless values (not the actual data) and when I specify the sep=" " argument for npy.fromfile I get just .1 and nothing else.
This is reading the compressed data, not the data that you want.
Can you tell me why and how to fix this problem? I know that I could read everything to memory, but these files are rather big, so I simply have to avoid this.
Read in reasonably-sized chunks of bytes at a time, and use np.fromstring() to create arrays from them.

Robert Kern wrote:
f=gzip.open( "myfile.gz", "r" )
xyz=npy.fromfile(f,dtype="float32",count=400)
Read in reasonably-sized chunks of bytes at a time, and use np.fromstring() to create arrays from them.
Something like:
count = 400 xyz = np.fromstring(f.read(count*4), dtype=np.float32)
should work (untested...)
-Chris
participants (3)
-
Christopher Barker
-
Peter Schmidtke
-
Robert Kern