Efficient grep using Python?

Tim Peters tim.peters at gmail.com
Wed Dec 15 17:21:44 CET 2004

["sf" <sf at sf.sf>]
>> I have files A, and B each containing say 100,000 lines (each
>> line=one string without any space)
>> I want to do
>> "  A  - (A intersection B)  "
>> Essentially, want to do efficient grep, i..e from A remove those
>> lines which are also present in file B.

[Fredrik Lundh]
> that's an unusual definition of "grep", but the following seems to
> do what you want:
> afile = "a.txt"
> bfile = "b.txt"
> bdict = dict.fromkeys(open(bfile).readlines())
> for line in open(afile):
>    if line not in bdict:
>        print line,
> </F> 

Note that an open file is an iterable object, yielding the lines in
the file.  The "for" loop exploited that above, but fromkeys() can
also exploit it.  That is,

bdict = dict.fromkeys(open(bfile))

is good enough (there's no need for the .readlines()).

More information about the Python-list mailing list