<html>

<body>

<font color="#800000">I hope this isn't too off-topic for astro, but I

know many here work with huge files.<br><br>

I was struggling yesterday with  methods of reading large disk files

into numpy efficiently (not FITS, just raw files of IEEE floats). When

loading arbitrarily large files it would be nice to not bother reading

more than the plot can display before zooming in.<br><br>

With a 2GB file, I want to read n (like 4096) evenly sampled points out

of it.<br>

I tried making a dtype, and other tricks, to read

"Pythonically", but failed. I broke down and used a for loop

with fh.seek() and fromstring() <br><br>

</font><font size=1>num_channels = 9<br>

desired_len = 4096<br>

bytes_per_val = numpy.dtype(numpy.float32).itemsize<br>

f_obj = open(path, 'rb')<br>

f_obj.seek(0,2)<br>

file_length = f_obj.tell()<br>

f_obj.seek(0,0)<br>

bytes_per_smp = num_channels * bytes_per_val<br>

num_samples = file_length / bytes_per_smp<br>

stride_smps = num_samples / desired_len ## an int<br>

stride_bytes = stride_smps * bytes_per_smp<br><br>

arr = numpy.zeros((desired_len, 9))<br>

for i in range(0, desired_len, 1):<br>

    f_obj.seek(i*stride_bytes, 0)<br>

    arr[i] = numpy.fromstring(f_obj.read(36), dtype='f32',

count=9)<br><br>

</font><font color="#800000">So, is there a better way to move the

pointer through the file without a for loop?<br>

Would a generator be much faster?<br><br>

The dtype and other methods like mmap fail with memoryError, although

apparently you can mmap with 64bit systems.<br><br>

- Ray<br><br>

</font></body>

</html>