mmap (was RE: mxTools (was Re: why no "do : until"?))

Tim Peters tim.one at home.com
Wed Jan 10 00:04:17 EST 2001


[Paul Prescod]
> I haven't tried this recently but if I remember correctly, the
> problem is: How would I do a regexp match on "Py..n" on a 1-
> gigabyte file with a fixed amount of "lookahead"? Do I have to
> load in chunks at a time and then do the pattern matching across
> "chunk boundaries" myself? ...

[A.M. Kuchling]
> With the mmapfile module, you should be able to mmap the entire
> gigabyte file, and then do an re.search() on the whole thing.
> Performance is then up to _sre and your OS's virtual memory
> algorithms.

That's what I thought, but on Windows I never got it to work -- always ended
up staring at incomprehensible "The parameter is wrong" Windows msgs.

Just figured out why, so will share it.  There's a small bug, and key info
is missing from the docs.

Here's a working example on Windows (Win98SE; last I heard mmap doesn't work
at all on ME, but nobody knows why).  "gafat" is a large file, with one
occurrence of "zzz" at the very end:

import mmap, re, os
fname = "gafat"

size = os.path.getsize(fname)
print "os.path.getsize() says size is", size, "bytes"

f = open(fname, "rb+")
mapper = mmap.mmap(f.fileno(), 0)

print "mmap .size() says size is", mapper.size(), "bytes"

m = re.search("zzz", mapper)
print m.group(0), m.span(0)

Here's a run:

C:\Code\python\dist\src\PCbuild>python mre.py
os.path.getsize() says size is 119800271 bytes
mmap .size() says size is 119800271 bytes
zzz (119800264, 119800267)

C:\Code\python\dist\src\PCbuild>

So that all worked.  Cool!

Notes:

1) Doc needed:  The file *must* be opened for update (r+, w+, rb+ or wb+).
That's because CreateFileMapping is called with PAGE_READWRITE, and that
won't let you increase permissions over the original open mode.  Violating
this yields the baffling "The parameter is wrong" errors from mmap.mmap().

It's unfortunate that the Unix mmap.mmap() takes the "flags" arg before the
"prot" arg, because we could have implemented "prot" for Windows too.

2) Doc needed:  Passing a size of 0 makes the maximum size of the mapping
the actual current size of the file.  Don't know whether that's also true on
Unices; on Windows it's intentional; and it's handy to know so it's worth
documenting.

3) Doc bug:  The docs read as if omitting the optional tagname is different
than supplying a tagname of None.  I don't believe that's true; needs mild
rewording.

4) Code bug:  If a tagname isn't supplied, it looks like the code intended
to follow the probable meaning of the docs (by creating a mapping without a
name), but it actually creates a mapping with a name (an empty string).
I'll check in a fix for that tonight.

mmap-will-never-catch-on-if-it-can't-be-used<wink>-ly y'rs  - tim





More information about the Python-list mailing list