Efficient grep using Python?

Tim Peters tim.peters at gmail.com
Wed Dec 15 17:21:44 CET 2004


["sf" <sf at sf.sf>]
>> I have files A, and B each containing say 100,000 lines (each
>> line=one string without any space)
>>
>> I want to do
>>
>> "  A  - (A intersection B)  "
>>
>> Essentially, want to do efficient grep, i..e from A remove those
>> lines which are also present in file B.

[Fredrik Lundh]
> that's an unusual definition of "grep", but the following seems to
> do what you want:
>
> afile = "a.txt"
> bfile = "b.txt"
>
> bdict = dict.fromkeys(open(bfile).readlines())
>
> for line in open(afile):
>    if line not in bdict:
>        print line,
> 
> </F> 

Note that an open file is an iterable object, yielding the lines in
the file.  The "for" loop exploited that above, but fromkeys() can
also exploit it.  That is,

bdict = dict.fromkeys(open(bfile))

is good enough (there's no need for the .readlines()).



More information about the Python-list mailing list