using mmap on large (> 2 Gig) files

sturlamolden sturlamolden at yahoo.no
Mon Oct 23 20:06:55 EDT 2006


myeates at jpl.nasa.gov wrote:

> Anyone ever done this? It looks like Python2.4 won't take a length arg

http://docs.python.org/lib/module-mmap.html

It seems that Python does take a length argument, but not an offset
argument (unlike the Windows' CreateFileMapping/MapViewOfFile and UNIX'
mmap), so you always map from the beginning of the file. Of course if
you have ever worked with memory mapping files in C, you will probably
have experienced that mapping a large file from beginning to end is a
major slowdown. And if the file is big enough, it does not even fit
inside the 32 bit memory space of your process. Thus you have to limit
the portion of the file that is mapped, using the offset and the length
arguments.

But the question remains whether Python's "mmap" qualifies as a "memory
mapping" at all. Memory mapping a file means that the file is "mapped"
into the process address space. So if you access a certain address
(using a pointer type in C), you will actually read from or write to
the file. On Windows, this mechanism is even used to access "files"
that does not live on the file system. E.g. if CreateFileMapping is
called with the file handle set to INVALID_HANDLE_VALUE, creates a file
mapping backed by the OS paging file. That is, you actually obtain a
shared memory segment e.g. usable for for inter-process communication.
How would you use Python's mmap for something like this?

I haven't looked at the source, but I'd be surprised if Python actually
maps the file into the process image when mmap is called. I believe
Python is not memory mapping at all; rather, it just opens a file in
the file system and uses fseek to move around. That is, you can use
slicing operators on Python's "memory mapped file object" as if it were
a list or a string, but it's not really memory mapping, it's just a
syntactical convinience. Because of this, you even need to manually
"flush" the memory mapping object. If you were talking to a real memory
mapped file, flushing would obviously not be required.

This probably means that your problem is irrelevant. Even if the file
is too large to fit inside a 32 bit process image, Python's memory
mapping would not be affected by this, as it is not memory mapping the
file when "mmap" is called.




More information about the Python-list mailing list