<html>
<body>
<font color="#800000">I hope this isn't too off-topic for astro, but I
know many here work with huge files.<br><br>
I was struggling yesterday with methods of reading large disk files
into numpy efficiently (not FITS, just raw files of IEEE floats). When
loading arbitrarily large files it would be nice to not bother reading
more than the plot can display before zooming in.<br><br>
With a 2GB file, I want to read n (like 4096) evenly sampled points out
of it.<br>
I tried making a dtype, and other tricks, to read
"Pythonically", but failed. I broke down and used a for loop
with fh.seek() and fromstring() <br><br>
</font><font size=1>num_channels = 9<br>
desired_len = 4096<br>
bytes_per_val = numpy.dtype(numpy.float32).itemsize<br>
f_obj = open(path, 'rb')<br>
f_obj.seek(0,2)<br>
file_length = f_obj.tell()<br>
f_obj.seek(0,0)<br>
bytes_per_smp = num_channels * bytes_per_val<br>
num_samples = file_length / bytes_per_smp<br>
stride_smps = num_samples / desired_len ## an int<br>
stride_bytes = stride_smps * bytes_per_smp<br><br>
arr = numpy.zeros((desired_len, 9))<br>
for i in range(0, desired_len, 1):<br>
f_obj.seek(i*stride_bytes, 0)<br>
arr[i] = numpy.fromstring(f_obj.read(36), dtype='f32',
count=9)<br><br>
</font><font color="#800000">So, is there a better way to move the
pointer through the file without a for loop?<br>
Would a generator be much faster?<br><br>
The dtype and other methods like mmap fail with memoryError, although
apparently you can mmap with 64bit systems.<br><br>
- Ray<br><br>
</font></body>
</html>