mmap caching
George Sakkis
george.sakkis at gmail.com
Sun Jan 21 16:32:19 EST 2007
Nick Craig-Wood wrote:
> George Sakkis <george.sakkis at gmail.com> wrote:
> > I've been trying to track down a memory leak (which I initially
> > attributed erroneously to numpy) and it turns out to be caused by a
> > memory mapped file. It seems that mmap caches without limit the chunks
> > it reads, as the memory usage grows to several hundreds MBs according
> > to the Windows task manager before it dies with a MemoryError. I'm
> > positive that these chunks are not referenced anywhere else; in fact if
> > I change the mmap object to a normal file, memory usage remains
> > constant. The documentation of mmap doesn't mention anything about
> > this. Can the caching strategy be modified at the user level ?
>
> I'm not familiar with mmap() on windows, but assuming it works the
> same way as unix...
>
> The point of mmap() is to map files into memory. It is completely up
> to the OS to bring pages into memory for you to read / write to, and
> completely up to the OS to get rid of them again.
>
> What you would expect is that the file is demand paged into memory as
> you access bits of it. These pages will remain in memory until the OS
> feels some memory pressure when the pages will be written out if dirty
> and then dropped.
>
> The OS will try to keep hold of pages as long as possible just in case
> you need them again. The pages dropped should be the least recently
> used pages.
>
> I wouldn't have expected a MemoryError though...
>
> Did you do mmap.flush() after writing?
The file is written once and then opened as read-only, there's no
flushing. So if caching is completely up to the OS, I take it that my
options are either (1) modify my algorithms so that they work in
fixed-size batches instead of arbitrarily long sequences or (2)
implement my own memory-mapping scheme to fit my algorithms. I guess
(1) would be the less trouble overall, or is there a way to give a hint
to the OS on how large cache can it use ?
George
More information about the Python-list
mailing list