[Numpy-discussion] Accessing a numpy array in a mmap fashion

Anne Archibald peridot.faceted at gmail.com
Thu Aug 30 11:34:11 EDT 2007


On 30/08/2007, Brian Donovan <donovan at mirsl.ecs.umass.edu> wrote:
> Hello all,
>
>   I'm wondering if there is a way to use a numpy array that uses disk as a
> memory store rather than ram. I'm looking for something like mmap but which
> can be used like a numpy array. The general idea is this. I'm simulating a
> system which produces a large dataset over a few hours of processing time.
> Rather than store the numpy array in memory during processing I'd like to
> write the data directly to disk but still be able to treat the array as a
> numpy array. Is this possible? Any ideas?

You want numpy.memmap:
http://mail.python.org/pipermail/python-list/2007-May/443036.html

This will do exactly what you want (though you may have problems with
arrays bigger than a few gigabytes, particularly on 32-bit systems)
and there may be a few rough edges. You will probably need to create
the file first.

Keep in mind that if the array is actually temporary, the virtual
memory system will push unused parts out to disk as memory fills up,
so there's no need to use memmap explicitly. If you want the array
permanently on disk, though, memmap is probably the most convenient
way to do it - though if your access patterns are not local it may
involve a lot of thrashing. Sequential disk writes have the advantage
(?) of forcing you to write code that accesses disks in a local
fashion.

Anne



More information about the NumPy-Discussion mailing list