At 07:35 AM 2/4/2014, Julian Taylor wrote:
On Tue, Feb 4, 2014 at 4:27 PM,
RayS <rays@blue-cove.com>
wrote:
- At 07:09 AM 2/4/2014, you wrote:
- >On 04/02/2014 16:01, RayS wrote:
- > > I was struggling with methods of reading large disk
files into numpy
- > > efficiently (not FITS or .npy, just raw files of IEEE
floats from
- > > numpy.tostring()). When loading arbitrarily large files it
would be nice
- > > to not bother reading more than the plot can display before
zooming in.
- > > There apparently are no built in methods that allow
skipping/striding...
- >
- >If you mmap the data file with np.memmap() you can access the
data in a
- >strided way through the numpy array interface and the OS will
handle the
- >scheduling of the reads from the disc.
- >
- >Note however if that the data samples you need are quite dense,
there is
- >no real advantage in doing this because the OS will have to read
a whole
- >page anyway for each read.
- Thanks Daniele, I'll be trying mmap with Python64. With 32 bit
the
- mmap method throws MemoryError with 2.5GB files...
- The idea is that we allow the users to inspect the huge files
- graphically, then they can "zoom" into regions of interest
and then
- load a ~100 MB en block for the usual spectral analysis.
memory maps are limited to the size of the available address space (31
bits with sign), so you would have to slide them, see e.g. the smmap
module.
But its not likely this is going to be much faster than a loop with
explicit seeks depending on the sparseness of the data. memory maps have
relatively high overheads at the operating system
level.
yes, very sparse data - ~4k out of 50 million
I hadn't tried smmap,
https://github.com/Byron/smmap, thanks
- Ray