python: ascii read

Heiko Wundram heikowu at ceosg.de
Thu Sep 16 12:56:23 EDT 2004


Am Donnerstag, 16. September 2004 17:56 schrieb Brian van den Broek:
> But I don't really feel I've a handle on the significance of saying it
> maps the file into memory versus reading the file. The naive thought is
> that since the data gets into memory, the file must be read. But this
> makes me sure I'm missing a distinction in the terminology. Explanations
> and pointers for what to read gratefully received.

read()ing a file into memory does what it says; it reads the binary data from 
the disk all at once, and allocates main memory (as needed) to fit all the 
data there. Memory mapping a file (or device or whatever) means that the 
virtual memory architecture is involved. What happens here:

mmapping a file creates virtual memory pages (just like virtual memory which 
is put into your paging file), which are registered with the MMU of the 
processor as being absent initially.

Now, when the program tries to access the memory page (pages are some fixed 
short length, like 4k for most Pentium-style computers), a (page) fault is 
generated by the MMU, which invokes the operating system's handler for page 
faults. Now that the operating system sees that a certain page is accessed 
(from the page address it can deduce the offset in the file that you're 
trying to access), it loads the corresponding page from disk, and puts it 
into memory at some position, and alters the pagetable entry in the LDT to be 
present.

Future accesses to the page will take place immediately (without a page fault 
taking place).

Changes in memory are written to disk once the page is flushed (meaning that 
it gets removed from main memory because there are too few pages available of 
real main memory). Now, when a page is forcefully flushed (not due to closing 
the mmap), the operating system marks the pagetable entry in the LDT to be 
absent again, and the next time the program tries to access this location, a 
page-fault again takes place, and the OS can load the page from disk.

For speed, the operating system allows you to mmap read-only, which means that 
once a page is discarded, it does not need to be written back to disk (which 
of course is faster). Some MMUs (IIRC not the Pentium-class MMU) set a dirty 
bit on the page-table entry once the page has been altered, this can also be 
used to control whether the page needs to be written back to disk after 
access.

So, basically what you get is load on demand file handling, which is similar 
to what the paging file (virtual memory file) on win32 does for allround 
memory. Actually, internally, the architecture to handle mmapped files and 
virtual memory is the same, and you could think of the swap file as an 
operating system mmapped file, from which programs can allocate slices 
through some OS calls (well, actually through the normal malloc/calloc 
calls).

HTH!

Heiko.



More information about the Python-list mailing list