Scanning a file
Paul Watson
pwatson at redlinepy.com
Sat Oct 29 15:15:46 EDT 2005
"Mike Meyer" <mwm at mired.org> wrote in message
news:864q70evci.fsf at bhuda.mired.org...
> "Paul Watson" <pwatson at redlinepy.com> writes:
...
> Did you do timings on it vs. mmap? Having to copy the data multiple
> times to deal with the overlap - thanks to strings being immutable -
> would seem to be a lose, and makes me wonder how it could be faster
> than mmap in general.
The only thing copied is a string one byte less than the search string for
each block.
I did not do due dilligence with respect to timings. Here is a small
dataset read sequentially and using mmap.
$ ls -lgG t.dat
-rw-r--r-- 1 16777216 Oct 28 16:32 t.dat
$ time ./scanfile.py
1048576
0.80s real 0.64s user 0.15s system
$ time ./scanfilemmap.py
1048576
20.33s real 6.09s user 14.24s system
With a larger file, the system time skyrockets. I assume that to be the
paging mechanism in the OS. This is Cyngwin on Windows XP.
$ ls -lgG t2.dat
-rw-r--r-- 1 268435456 Oct 28 16:33 t2.dat
$ time ./scanfile.py
16777216
28.85s real 16.37s user 0.93s system
$ time ./scanfilemmap.py
16777216
323.45s real 94.45s user 227.74s system
More information about the Python-list
mailing list