[Matrix-SIG] An Experiment in code-cleanup.

Phil Austin phil@geog.ubc.ca
Tue, 8 Feb 2000 15:33:18 -0800 (PST)

Travis Oliphant writes:
 > 3) Facility for memory-mapped dataspace in arrays.

For the NumPy users who are as ignorant about mmap, msync,
and madvise as I am, I've put a couple of documents on
my web site:

1) http://www.geog.ubc.ca/~phil/mmap/mmap.pdf

A pdf version of Kevin Sheehan's paper: "Why aren't you
using mmap yet?" (19 page Frame postscript orginal, page order
back to front).  He gives a good discusion of the SV4 VM model,
with some mmap examples in C.

2) http://www.geog.ubc.ca/~phil/mmap/threads.html

An archived email exchange (initially on the F90 mailing list) between
Kevin (who is an independent Solaris consultant) and Brian Sumner
(SGI) about the pros and cons of using mmap.  

Executive summary:

i) mmap on Solaris can be a very big win (see bottom of
http://www.geog.ubc.ca/~phil/mmap/msg00003.html) when
used in combination with WILLNEED/WONTNEED  madvise calls to
guide the page prefetching.

ii) IRIX and some other Unices (Linux 2.2 in particular), haven't
implemented madvise, and naive use of mmap without madvise can produce
lots of page faulting and much slower io than, say, asynchronous io
calls on IRIX.  (http://www.geog.ubc.ca/~phil/mmap/msg00009.html)

So I'd love to see mmap in Numpy, but we may need to produce a
tutorial outlining the tradeoffs, and giving some examples of
madvise/msync/mmap used together (with a few benchmarks).  Any mmap
module would need to include member functions that call madvise/msync
for the mmapped array (but these may be no-ops on several popular OSes.)

Regards, Phil