a program to delete duplicate files

Patrick Useldinger pu.news.001 at gmail.com
Sun Mar 13 02:28:52 EST 2005


John Machin wrote:

> Oh yeah, "the computer said so, it must be correct". Even with your
> algorithm, I would be investigating cases where files were duplicates
> but there was nothing in the names or paths that suggested how that
> might have come about.

Of course, but it's good to know that the computer is right, isn't it? 
That leaves the human to take decisions instead of double-checking.

> I beg your pardon, I was wrong. Bad memory. It's the case of running
> out of the minuscule buffer pool that you allocate by default where it
> panics and pulls the sys.exit(1) rip-cord.

Bufferpool is a parameter, and the default values allow for 4096 files 
of the same size. It's more likely to run out of file handles than out 
of bufferspace, don't you think?

> The pythonic way is to press ahead optimistically and recover if you
> get bad news.

You're right, that's what I thought about afterwards. Current idea is to 
design a second class that opens/closes/reads the files and handles the 
situation independantly of the main class.

> I didn't "ask"; I suggested. I would never suggest a
> class-for-classes-sake. You had already a singleton class; why
> another". What I did suggest was that you provide a callable interface
> that returned clusters of duplicates [so that people could do their own
> thing instead of having to parse your file output which contains a
> mixture of warning & info messages and data].

That is what I have submitted to you. Are you sure that *I* am the 
lawyer here?

>>>Re (a): what evidence do you have?

See ;-)

> Interesting. Less on XP than on 2000? Maybe there's a machine-wide
> limit, not a per-process limit, like the old DOS max=20. What else was
> running at the time?

Nothing I started manually, but the usual bunch of local firewall, virus 
scanner (not doing a complete machine check at that time).

> Test:
> !for k in range(1000):
> !    open('foo' + str(k), 'w')

I'll try that.

> Announce:
> "I can open A files at once on box B running os C. The most files of
> the same length that I have seen is D. The ratio A/D is small enough
> not to worry."

I wouldn't count on that on a multi-tasking environment, as I said. The 
class I described earlier seems a cleaner approach.

Regards,
-pu



More information about the Python-list mailing list