[AstroPy] striding through arbitrarily large files
RayS
rays at blue-cove.com
Sat Feb 1 17:50:22 EST 2014
I hope this isn't too off-topic for astro, but I know many here work
with huge files.
I was struggling yesterday with methods of reading large disk files
into numpy efficiently (not FITS, just raw files of IEEE floats).
When loading arbitrarily large files it would be nice to not bother
reading more than the plot can display before zooming in.
With a 2GB file, I want to read n (like 4096) evenly sampled points out of it.
I tried making a dtype, and other tricks, to read "Pythonically", but
failed. I broke down and used a for loop with fh.seek() and fromstring()
num_channels = 9
desired_len = 4096
bytes_per_val = numpy.dtype(numpy.float32).itemsize
f_obj = open(path, 'rb')
f_obj.seek(0,2)
file_length = f_obj.tell()
f_obj.seek(0,0)
bytes_per_smp = num_channels * bytes_per_val
num_samples = file_length / bytes_per_smp
stride_smps = num_samples / desired_len ## an int
stride_bytes = stride_smps * bytes_per_smp
arr = numpy.zeros((desired_len, 9))
for i in range(0, desired_len, 1):
f_obj.seek(i*stride_bytes, 0)
arr[i] = numpy.fromstring(f_obj.read(36), dtype='f32', count=9)
So, is there a better way to move the pointer through the file
without a for loop?
Would a generator be much faster?
The dtype and other methods like mmap fail with memoryError, although
apparently you can mmap with 64bit systems.
- Ray
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20140201/410d1b24/attachment.html>
More information about the AstroPy
mailing list