How to remove subset from a file efficiently?
Raymond Hettinger
python at rcn.com
Fri Jan 13 01:29:22 EST 2006
AJL wrote:
> How fast does this run?
>
> a = set(file('PSP0000320.dat'))
> b = set(file('CBR0000319.dat'))
> file('PSP-CBR.dat', 'w').writelines(a.difference(b))
Turning PSP into a set takes extra time, consumes unnecessary memory,
eliminates duplicates (possibly a bad thing), and loses the original
input ordering (probably a bad thing).
To jam the action into a couple lines, try this:
b = set(file('CBR0000319.dat'))
file('PSP-CBR.dat','w').writelines(itertools.ifilterfalse(b.__contains__,file('PSP0000320.dat')))
Raymond
More information about the Python-list
mailing list