a program to delete duplicate files
David Eppstein
eppstein at ics.uci.edu
Tue Mar 15 01:14:16 EST 2005
In article <871xaisdqz.fsf at pobox.com>, jjl at pobox.com (John J. Lee)
wrote:
> > If you read them in parallel, it's _at most_ m (m is the worst case
> > here), not 2(m-1). In my tests, it has always significantly less than
> > m.
>
> Hmm, Patrick's right, David, isn't he?
Yes, I was only considering pairwise comparisons. As he says,
simultaneously comparing all files in a group would avoid repeated reads
without the CPU overhead of a strong hash. Assuming you use a system
that allows you to have enough files open at once...
> And I'm not sure what the trade off between disk seeks and disk reads
> does to the problem, in practice (with caching and realistic memory
> constraints).
Another interesting point.
--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
More information about the Python-list
mailing list