Efficient grep using Python?
tim.peters at gmail.com
Wed Dec 15 17:21:44 CET 2004
["sf" <sf at sf.sf>]
>> I have files A, and B each containing say 100,000 lines (each
>> line=one string without any space)
>> I want to do
>> " A - (A intersection B) "
>> Essentially, want to do efficient grep, i..e from A remove those
>> lines which are also present in file B.
> that's an unusual definition of "grep", but the following seems to
> do what you want:
> afile = "a.txt"
> bfile = "b.txt"
> bdict = dict.fromkeys(open(bfile).readlines())
> for line in open(afile):
> if line not in bdict:
> print line,
Note that an open file is an iterable object, yielding the lines in
the file. The "for" loop exploited that above, but fromkeys() can
also exploit it. That is,
bdict = dict.fromkeys(open(bfile))
is good enough (there's no need for the .readlines()).
More information about the Python-list