[Numpy-discussion] striding through arbitrarily large files

RayS rays at blue-cove.com
Tue Feb 4 10:27:16 EST 2014


At 07:09 AM 2/4/2014, you wrote:
>On 04/02/2014 16:01, RayS wrote:
> > I was struggling with  methods of reading large disk files into numpy
> > efficiently (not FITS or .npy, just raw files of IEEE floats from
> > numpy.tostring()). When loading arbitrarily large files it would be nice
> > to not bother reading more than the plot can display before zooming in.
> > There apparently are no built in methods that allow skipping/striding...
>
>If you mmap the data file with np.memmap() you can access the data in a
>strided way through the numpy array interface and the OS will handle the
>scheduling of the reads from the disc.
>
>Note however if that the data samples you need are quite dense, there is
>no real advantage in doing this because the OS will have to read a whole
>page anyway for each read.

Thanks Daniele, I'll be trying mmap with Python64. With 32 bit the 
mmap method throws MemoryError with 2.5GB files...
The idea is that we allow the users to inspect the huge files 
graphically, then they can "zoom" into regions of interest and then 
load a ~100 MB en block for the usual spectral analysis.

- Ray 




More information about the NumPy-Discussion mailing list