Scanning a file

Paul Watson pwatson at
Sat Oct 29 21:15:46 CEST 2005

"Mike Meyer" <mwm at> wrote in message 
news:864q70evci.fsf at
> "Paul Watson" <pwatson at> writes:
> Did you do timings on it vs. mmap? Having to copy the data multiple
> times to deal with the overlap - thanks to strings being immutable -
> would seem to be a lose, and makes me wonder how it could be faster
> than mmap in general.

The only thing copied is a string one byte less than the search string for 
each block.

I did not do due dilligence with respect to timings.  Here is a small 
dataset read sequentially and using mmap.

$ ls -lgG t.dat
-rw-r--r--  1 16777216 Oct 28 16:32 t.dat
$ time  ./
    0.80s real     0.64s user     0.15s system
$ time  ./
   20.33s real     6.09s user    14.24s system

With a larger file, the system time skyrockets. I assume that to be the 
paging mechanism in the OS.  This is Cyngwin on Windows XP.

$ ls -lgG t2.dat
-rw-r--r--  1 268435456 Oct 28 16:33 t2.dat
$ time  ./
   28.85s real    16.37s user     0.93s system
$ time  ./
  323.45s real    94.45s user   227.74s system

More information about the Python-list mailing list