Efficient grep using Python?
janeaustine50 at hotmail.com
Thu Dec 16 05:19:45 CET 2004
>>> bdict = dict.fromkeys(open(bfile).readlines())
>>> for line in open(afile):
>>> if line not in bdict:
>>> print line,
>> Note that an open file is an iterable object, yielding the lines in
>> the file. The "for" loop exploited that above, but fromkeys() can
>> also exploit it. That is,
>> bdict = dict.fromkeys(open(bfile))
>> is good enough (there's no need for the .readlines()).
> (sigh. my brain knows that, but my fingers keep forgetting)
> and yes, for this purpose, "dict.fromkeys" can be replaced
> with "set".
> bdict = set(open(bfile))
> (and then you can save a few more bytes by renaming the
> Except the latter two are just shallow spelling changes. Switching
> from fromkeys(open(f).readlines()) to fromkeys(open(f)) is much more
> interesting, since it can allow major reduction in memory use. Even
> if all the lines in the file are pairwise distinct, not materializing
> them into a giant list can be a significant win. I wouldn't have
> bothered replying if the only point were that you can save a couple
> bytes of typing <wink>.
fromkeys(open(f).readlines()) and fromkeys(open(f)) seem to be
When I pass an iterator instance(or a generator iterator) to the
dict.fromkeys, it is expanded at that moment, thus fromkeys(open(f))
is effectively same with fromkeys(list(open(f))) and
Am I missing something?
More information about the Python-list