a program to delete duplicate files
Patrick Useldinger
pu.news.001 at gmail.com
Sun Mar 13 02:28:52 EST 2005
John Machin wrote:
> Oh yeah, "the computer said so, it must be correct". Even with your
> algorithm, I would be investigating cases where files were duplicates
> but there was nothing in the names or paths that suggested how that
> might have come about.
Of course, but it's good to know that the computer is right, isn't it?
That leaves the human to take decisions instead of double-checking.
> I beg your pardon, I was wrong. Bad memory. It's the case of running
> out of the minuscule buffer pool that you allocate by default where it
> panics and pulls the sys.exit(1) rip-cord.
Bufferpool is a parameter, and the default values allow for 4096 files
of the same size. It's more likely to run out of file handles than out
of bufferspace, don't you think?
> The pythonic way is to press ahead optimistically and recover if you
> get bad news.
You're right, that's what I thought about afterwards. Current idea is to
design a second class that opens/closes/reads the files and handles the
situation independantly of the main class.
> I didn't "ask"; I suggested. I would never suggest a
> class-for-classes-sake. You had already a singleton class; why
> another". What I did suggest was that you provide a callable interface
> that returned clusters of duplicates [so that people could do their own
> thing instead of having to parse your file output which contains a
> mixture of warning & info messages and data].
That is what I have submitted to you. Are you sure that *I* am the
lawyer here?
>>>Re (a): what evidence do you have?
See ;-)
> Interesting. Less on XP than on 2000? Maybe there's a machine-wide
> limit, not a per-process limit, like the old DOS max=20. What else was
> running at the time?
Nothing I started manually, but the usual bunch of local firewall, virus
scanner (not doing a complete machine check at that time).
> Test:
> !for k in range(1000):
> ! open('foo' + str(k), 'w')
I'll try that.
> Announce:
> "I can open A files at once on box B running os C. The most files of
> the same length that I have seen is D. The ratio A/D is small enough
> not to worry."
I wouldn't count on that on a multi-tasking environment, as I said. The
class I described earlier seems a cleaner approach.
Regards,
-pu
More information about the Python-list
mailing list