How to remove subset from a file efficiently?
unixfd0.n0spam at yahoo.com
Fri Jan 13 16:44:09 CET 2006
On 12 Jan 2006 22:29:22 -0800
"Raymond Hettinger" <python at rcn.com> wrote:
> AJL wrote:
> > How fast does this run?
> > a = set(file('PSP0000320.dat'))
> > b = set(file('CBR0000319.dat'))
> > file('PSP-CBR.dat', 'w').writelines(a.difference(b))
> Turning PSP into a set takes extra time, consumes unnecessary memory,
> eliminates duplicates (possibly a bad thing), and loses the original
> input ordering (probably a bad thing).
> To jam the action into a couple lines, try this:
> b = set(file('CBR0000319.dat'))
The OP said "assume machine has plenty memory". ;)
I saw some solutions that used sets and was wondering why they stopped
at using a set for the first file and not the second when the problem is
really a set problem but I can see the reasoning behind it now.
More information about the Python-list