fdups: calling for beta testers

John Machin sjmachin at lexicon.net
Sat Feb 26 15:57:44 EST 2005


Patrick Useldinger wrote:
> John Machin wrote:
>
> > (1) It's actually .bz2, not .bz (2) Why annoy people with the
> > not-widely-known bzip2 format just to save a few % of a 12KB file??
(3)
> > Typing that on Windows command line doesn't produce a useful result
(4)
> > Haven't you heard of distutils?
>
> (1) Typo, thanks for pointing it out
> (2)(3) In the Linux world, it is really popular. I suppose you are a
> Windows user, and I haven't given that much thought. The point was
not
> to save space, just to use the "standard" format. What would it be
for
> Windows - zip?

Yes. Moreover, "WinZip", the most popular archive-handler, doesn't grok
bzip2.

> > (6) You are keeping open handles for all files of a given size --
have
> > you actually considered the possibility of an exception like this:
> > IOError: [Errno 24] Too many open files: 'foo509'
>
> (6) Not much I can do about this. In the beginning, all files of
equal
> size are potentially identical. I first need to read a chunk of each,

> and if I want to avoid opening & closing files all the time, I need
them
> open together.
> What would you suggest?

Test, like I did, to see how many open handles you can get away with. I
was not joking, 20 was the max on MS-DOS at one stage and I vaguely
recall: (a) some low limits on various flavours of *x (b) the "ulimit"
command can be used to vary the per-process limit but (c) there is a
system-wide limit also.

You should consider a fall-back method to be used in this case and in
the case of too many files for your 1Mb (default) buffer pool. BTW 1Mb
seems tiny; desktop PCs come with 512MB standard these days, and Bill
does leave a bit more than 1MB available for applications.

> > And what is "chown" -- any relation of Perl's "chomp"?
>
> chown is a Unix command to change the owner or the group of a file.
It
> has to do with controlling access to the file. It is not relevant on
> Windows. No relation to Perl's chomp.

The question was rhetorical. Your irony detector must be on the fritz.
:-)

> Did you actually run it on your
> Windows box?

Yes, with trepidation, after carefully reading the source. It detected
some highly plausible duplicates, which I haven't verified yet.

Cheers,
John




More information about the Python-list mailing list