Efficient grep using Python?

Tim Peters tim.peters at gmail.com
Wed Dec 15 20:11:17 CET 2004


[Fredrik Lundh]
>>> bdict = dict.fromkeys(open(bfile).readlines())
>>>
>>> for line in open(afile):
>>>    if line not in bdict:
>>>        print line,
>>>
>>> </F>

[Tim Peters]
>> Note that an open file is an iterable object, yielding the lines in
>> the file.  The "for" loop exploited that above, but fromkeys() can
>> also exploit it.  That is,
>>
>> bdict = dict.fromkeys(open(bfile))
>>
>> is good enough (there's no need for the .readlines()).

[/F] 
> (sigh.  my brain knows that, but my fingers keep forgetting)
> 
> and yes, for this purpose, "dict.fromkeys" can be replaced
> with "set".
>
>    bdict = set(open(bfile))
>
> (and then you can save a few more bytes by renaming the
> variable...)

Except the latter two are just shallow spelling changes.  Switching
from fromkeys(open(f).readlines()) to fromkeys(open(f)) is much more
interesting, since it can allow major reduction in memory use.  Even
if all the lines in the file are pairwise distinct, not materializing
them into a giant list can be a significant win.  I wouldn't have
bothered replying if the only point were that you can save a couple
bytes of typing <wink>.



More information about the Python-list mailing list