regex over files

Bengt Richter bokr at oz.net
Thu Apr 28 22:04:46 EDT 2005


On Thu, 28 Apr 2005 20:35:43 +0000, Robin Becker <robin at SPAMREMOVEjessikat.fsnet.co.uk> wrote:

>Jeremy Bowers wrote:
>....
> >
> > As you try to understand mmap, make sure your mental model can take into
> > account the fact that it is easy and quite common to mmap a file several
> > times larger than your physical memory, and it does not even *try* to read
> > the whole thing in at any given time. You may benefit from
> > reviewing/studying the difference between virtual memory and physical
> > memory.
>I've been using vm systems for 30 years and I suspect my mental model is a bit
>decrepit. However, as convincingly demonstrated by testing my mental model seems
>able to predict low memory problems. When systems run out of memory they tend to
>perform poorly. I'm not sure the horrible degradation I see with large files is
>necessary, but I know it occurs on at least two common vm implementations.

It's interesting. One could envisage an mmap that would hava a parameter for its own
lru working set max page count, so mmap would only displace up to that many
pages from normal system paged-in file data. Then you could induce extra
reads by referring back to abandoned mmap-lru pages, but you wouldn't be
displacing anything else, and if you were moving sequentially and staying within
your page residency count allotment, things would work like the best of both worlds
(assuming this idea doesn't have a delusion-busting gotcha lurking ;-) 
But this kind of partitioning of VM lru logic would take some kernel changes IWT.

IIRC, don't mmap VM access ideas date back to multics at least?
Anyway, what with fancy controllers as well as fancy file systems and kernels,
it's easy to get hard-to-interpret results, but your large-file examples seem
pretty conclusive.

Regards,
Bengt Richter



More information about the Python-list mailing list