[Numpy-discussion] striding through arbitrarily large files

RayS rays at blue-cove.com
Tue Feb 4 10:46:56 EST 2014

At 07:35 AM 2/4/2014, Julian Taylor wrote:
>On Tue, Feb 4, 2014 at 4:27 PM, RayS 
><<mailto:rays at blue-cove.com>rays at blue-cove.com> wrote:
>At 07:09 AM 2/4/2014, you wrote:
> >On 04/02/2014 16:01, RayS wrote:
> > > I was struggling with  methods of reading large disk files into numpy
> > > efficiently (not FITS or .npy, just raw files of IEEE floats from
> > > numpy.tostring()). When loading arbitrarily large files it would be nice
> > > to not bother reading more than the plot can display before zooming in.
> > > There apparently are no built in methods that allow skipping/striding...
> >
> >If you mmap the data file with np.memmap() you can access the data in a
> >strided way through the numpy array interface and the OS will handle the
> >scheduling of the reads from the disc.
> >
> >Note however if that the data samples you need are quite dense, there is
> >no real advantage in doing this because the OS will have to read a whole
> >page anyway for each read.
>Thanks Daniele, I'll be trying mmap with Python64. With 32 bit the
>mmap method throws MemoryError with 2.5GB files...
>The idea is that we allow the users to inspect the huge files
>graphically, then they can "zoom" into regions of interest and then
>load a ~100 MB en block for the usual spectral analysis.
>memory maps are limited to the size of the available address space 
>(31 bits with sign), so you would have to slide them, see e.g. the 
>smmap module.
>But its not likely this is going to be much faster than a loop with 
>explicit seeks depending on the sparseness of the data. memory maps 
>have relatively high overheads at the operating system level.

yes, very sparse data - ~4k out of 50 million
I hadn't tried smmap, https://github.com/Byron/smmap, thanks

- Ray
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140204/7d9b5fea/attachment.html>

More information about the NumPy-Discussion mailing list