[Tutor] Working with files

Fri, 05 Apr 2002 10:38:23 +0200

Hello,

At 19:10 04/04/2002 -0500, you wrote:
>Date: Thu, 4 Apr 2002 18:27:26 -0500
>Subject: Re: [Tutor] Working with files
>From: Erik Price <erikprice@mac.com>

> > import re
> >
> > for line in inp.readlines():
> >   if re.search(r'Canada', line): continue # if line contains 'Canada'
> >   outp.write(line)
>
>The only thing with this is that it wouldn't catch a phrase (or a word)
>that was split between two lines.
>
>I can't think of a good solution that doesn't require the entire
>haystack string to be read into memory.

You could try using 'read(aChunkSize)' instead of 'readlines()'.

Except from the library reference:

read([size])
Read at most size bytes from the file (less if the read hits EOF before 
obtaining size bytes). If the size argument is negative or omitted, read 
all data until EOF is reached. The bytes are returned as a string object. 
An empty string is returned when EOF is encountered immediately.

So I guess you could read and test chunks two by two. Sounds a bit more 
complex since you'd need to test a+b, b+c, etc. I suppose there is a 
standard solution (anyone ? :-)

Cheers.

Alexandre