parsing a long text file for specific text

Tim Roberts timr at probo.com
Wed Jan 30 02:37:27 EST 2002


"Jim Ragsdale" <overlord at netdoor.com> wrote:
>
>Before if I have done anything like this, I used a loop to check to see if
>it matched a piece of text, but this takes a while. Is there a better way?
>Any thoughts? Any comments would be appreciated, thanks!

Here's one comment.  There are several ways to scan through all the lines
of a file.  The most obvious is this:

   f = open('filexxx','r')

   for ln in f.readlines():
     ...

The problem with this is that readlines() reads the ENTIRE file into a list
in memory, and then starts feeding them one at a time into the loop.  This
tends to be the slowest method for long files.

One alternative is to read the file in smaller-sized chunks:

   while 1:
      chunk = f.readlines(100000)
      if not chunk: break
      for ln in chunk:
        ..

This performs much better because you've reduced memory thrashing, although
it's not as "pretty".  Recently, another option was added:

   for ln in f.xreadlines():
     ...

xreadlines reads the file one line at a time instead of all at once.  In my
benchmarks, xreadlines usually comes out as the winner.
--
- Tim Roberts, timr at probo.com
  Providenza & Boekelheide, Inc.



More information about the Python-list mailing list