using mmap on large (> 2 Gig) files
pandyacus.xspam at xspam.sbcglobal.net
Thu Oct 26 03:25:38 EDT 2006
Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:
> "sturlamolden" <sturlamolden at yahoo.no> writes:
>> However, "memory mapping" a file by means of fseek() is probably more
>> efficient than using UNIX' mmap() or Windows'
> Why on would you think that?! It is counterintuitive. fseek beyond
> whatever is buffered in stdio (usually no more than 1kbyte or so)
> requires a system call, while mmap is just a memory access.
And the buffer copy required with every I/O from/to the application.
>> In Python, we don't always need the file memory mapped, we normally
>> just want to use slicing-operators, for-loops and other goodies on
>> the file object -- i.e. we just want to treat the file as a Python
>> container object. There are many ways of achieving that.
> Some of the time we want to share the region with other processes.
> Sometimes we just want random access to a big file on disk without
> having to do a lot of context switches seeking around in the file.
>> There are in any case room for improving Python's mmap object.
> IMO it should have some kind of IPC locking mechanism added, in
> addition to the offset stuff suggested.
The type of IPC required differs depending on who is using the shared region -
either another python process or another external program. Apart from the
spinlock primitives, other types of synchronization mechanisms are provided by
the OS. However, I do see value in providing a shared memory based spinlock
mechanism. These services can be built on top of the shared memory
infrastructure. I am not sure what kind or real world python applications use
More information about the Python-list