[Numpy-discussion] Loading a > GB file into array

Hans Meine meine at informatik.uni-hamburg.de
Fri Dec 21 08:14:25 EST 2007


Am Freitag, 21. Dezember 2007 13:23:49 schrieb David Cournapeau:
> > Instead of saying "memmap is ALL about disc access" I would rather
> > like to say that "memap is all about SMART disk access" -- what I mean
> > is that memmap should run as fast as a normal ndarray if it works on
> > the cached part of an array.  Maybe there is a way of telling memmap
> > when and what to cache  and when to sync that cache to the disk.
> > In other words, memmap should perform just like a in-pysical-memory
> > array  -- only that it once-in-a-while saves/load to/from the disk.
> > Or is this just wishful thinking ?
> > Is there a way of "pre loading" a given part into cache
> > (pysical-memory) or prevent disc writes at "bad times" ?
> > How about doing the sync from a different thread ;-)
>
> mmap is using the OS IO caches, that's kind of the point of using mmap
> (at least in this case). Instead of doing the caching yourself, the OS
> does it for you, and OS are supposed to be smart about this :)

AFAICS this is what Sebastian wanted to say, but as the OP indicated, 
preloading e.g. by reading the whole array once did not work for him.
Thus, I understand Sebastian's questions as "is it possible to help the OS 
when it is not smart enough?".  Maybe something along the lines of mlock, 
only not quite as aggressive.

Ciao, /  /
     /--/
    /  / ANS



More information about the NumPy-Discussion mailing list