using mmap on large (> 2 Gig) files

sturlamolden sturlamolden at yahoo.no
Tue Oct 24 18:47:25 EDT 2006


Donn Cave wrote:
> Wow, you're sure a wizard!  Most people would need to look before
> making statements like that.

I know, but your news-server doesn't honour cancel messages. :)

Python's mmap does indeed memory map the file into the process image.
It does not fake memory mapping by means of file seek operations.

However, "memory mapping" a file by means of fseek() is probably more
efficient than using UNIX' mmap() or Windows'
CreateFileMapping()/MapViewOfFile(). In Python, we don't always need
the file memory mapped, we normally just want to use slicing-operators,
for-loops and other goodies on the file object -- i.e. we just want to
treat the file as a Python container object. There are many ways of
achieving that.

We can implement a container object backed by a binary file just as
efficient (and possibly even more efficient) without using the OS'
memory mapping facilities. The major advantage is that we can
"pseudo-memory map" a lot more than a 32 bit address space can harbour.


However - as I wrote in another posting - memory-mapping may also be
used to create shared memory on Windows, and that doesn't fit easily
into the fseek scheme. But apart from that, I don't see why true memory
mapping has any real advantage on Python. As long as slicing operators
work, users will probably not be able to tell the difference.

There are in any case room for improving Python's mmap object.




More information about the Python-list mailing list