[AstroPy] striding through arbitrarily large files

Mon Feb 3 16:27:42 EST 2014

Nice tip, thank you. I will add the count= option to the fromfile() 
call to get my 9 channels of the sample - otherwise the default is to 
read the rest of the file.

I have to support 32bit Python for now (on 64 bit machines), until 
mayavi supports 64. I haven't found a better, easily usable and fast 
3D surface library.

Ray

At 11:16 AM 2/3/2014, Erik Bray wrote:
>Indeed, normally I would suggest just to use mmap, but if you really have to
>support 32-bit systems that won't help you as much as one might 
>like.  As it is
>what you are doing is mostly what you need to do for Numpy, but you 
>don't need
>to do thankfully is read a string from the file and then use numpy.fromstring.
>
>I actually only just discovered this myself, as it is undocumented, 
>and have yet
>to fix it in PyFITS.  But if you pass numpy.fromfile an open file 
>object it will
>read starting from wherever the file pointer is positioned, rather than the
>beginning of the file.
>
>So you can just:
>
>f = open('array.raw', 'rb')
>f.seek(np.dtype('float32').itemsize * offset)
>section = np.fromfile(f, dtype='float32')
>
>So that at least saves you from having to perform the read() first.
>
>Erik
>
>On 02/01/2014 05:50 PM, RayS wrote:
> > I hope this isn't too off-topic for astro, but I know many here 
> work with huge
> > files.
> >
> > I was struggling yesterday with  methods of reading large disk 
> files into numpy
> > efficiently (not FITS, just raw files of IEEE floats). When 
> loading arbitrarily
> > large files it would be nice to not bother reading more than the plot can
> > display before zooming in.
> >
> > With a 2GB file, I want to read n (like 4096) evenly sampled 
> points out of it.
> > I tried making a dtype, and other tricks, to read "Pythonically", 
> but failed. I
> > broke down and used a for loop with fh.seek() and fromstring()
> >
> > num_channels = 9
> > desired_len = 4096
> > bytes_per_val = numpy.dtype(numpy.float32).itemsize
> > f_obj = open(path, 'rb')
> > f_obj.seek(0,2)
> > file_length = f_obj.tell()
> > f_obj.seek(0,0)
> > bytes_per_smp = num_channels * bytes_per_val
> > num_samples = file_length / bytes_per_smp
> > stride_smps = num_samples / desired_len ## an int
> > stride_bytes = stride_smps * bytes_per_smp
> >
> > arr = numpy.zeros((desired_len, 9))
> > for i in range(0, desired_len, 1):
> >      f_obj.seek(i*stride_bytes, 0)
> >      arr[i] = numpy.fromstring(f_obj.read(36), dtype='f32', count=9)
> >
> > So, is there a better way to move the pointer through the file 
> without a for loop?
> > Would a generator be much faster?
> >
> > The dtype and other methods like mmap fail with memoryError, 
> although apparently
> > you can mmap with 64bit systems.
> >
> > - Ray
> >
> >
> >
> > _______________________________________________
> > AstroPy mailing list
> > AstroPy at scipy.org
> > http://mail.scipy.org/mailman/listinfo/astropy
> >
>
>_______________________________________________
>AstroPy mailing list
>AstroPy at scipy.org
>http://mail.scipy.org/mailman/listinfo/astropy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20140203/abdf57f8/attachment.html>