<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Feb 4, 2014 at 4:27 PM, RayS <span dir="ltr"><<a href="mailto:rays@blue-cove.com" target="_blank">rays@blue-cove.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">At 07:09 AM 2/4/2014, you wrote:<br>

>On 04/02/2014 16:01, RayS wrote:<br>

> > I was struggling with  methods of reading large disk files into numpy<br>

> > efficiently (not FITS or .npy, just raw files of IEEE floats from<br>

> > numpy.tostring()). When loading arbitrarily large files it would be nice<br>

> > to not bother reading more than the plot can display before zooming in.<br>

> > There apparently are no built in methods that allow skipping/striding...<br>

><br>

>If you mmap the data file with np.memmap() you can access the data in a<br>

>strided way through the numpy array interface and the OS will handle the<br>

>scheduling of the reads from the disc.<br>

><br>

>Note however if that the data samples you need are quite dense, there is<br>

>no real advantage in doing this because the OS will have to read a whole<br>

>page anyway for each read.<br>

<br>

</div>Thanks Daniele, I'll be trying mmap with Python64. With 32 bit the<br>

mmap method throws MemoryError with 2.5GB files...<br>

The idea is that we allow the users to inspect the huge files<br>

graphically, then they can "zoom" into regions of interest and then<br>

load a ~100 MB en block for the usual spectral analysis.<br>

<br></blockquote><div><br></div><div>memory maps are limited to the size of the available address space (31 bits with sign), so you would have to slide them, see e.g. the smmap module.<br></div><div>But its not likely this is going to be much faster than a loop with explicit seeks depending on the sparseness of the data. memory maps have relatively high overheads at the operating system level.<br>

</div></div></div></div>