fdups: calling for beta testers
John Machin
sjmachin at lexicon.net
Sat Feb 26 15:57:44 EST 2005
Patrick Useldinger wrote:
> John Machin wrote:
>
> > (1) It's actually .bz2, not .bz (2) Why annoy people with the
> > not-widely-known bzip2 format just to save a few % of a 12KB file??
(3)
> > Typing that on Windows command line doesn't produce a useful result
(4)
> > Haven't you heard of distutils?
>
> (1) Typo, thanks for pointing it out
> (2)(3) In the Linux world, it is really popular. I suppose you are a
> Windows user, and I haven't given that much thought. The point was
not
> to save space, just to use the "standard" format. What would it be
for
> Windows - zip?
Yes. Moreover, "WinZip", the most popular archive-handler, doesn't grok
bzip2.
> > (6) You are keeping open handles for all files of a given size --
have
> > you actually considered the possibility of an exception like this:
> > IOError: [Errno 24] Too many open files: 'foo509'
>
> (6) Not much I can do about this. In the beginning, all files of
equal
> size are potentially identical. I first need to read a chunk of each,
> and if I want to avoid opening & closing files all the time, I need
them
> open together.
> What would you suggest?
Test, like I did, to see how many open handles you can get away with. I
was not joking, 20 was the max on MS-DOS at one stage and I vaguely
recall: (a) some low limits on various flavours of *x (b) the "ulimit"
command can be used to vary the per-process limit but (c) there is a
system-wide limit also.
You should consider a fall-back method to be used in this case and in
the case of too many files for your 1Mb (default) buffer pool. BTW 1Mb
seems tiny; desktop PCs come with 512MB standard these days, and Bill
does leave a bit more than 1MB available for applications.
> > And what is "chown" -- any relation of Perl's "chomp"?
>
> chown is a Unix command to change the owner or the group of a file.
It
> has to do with controlling access to the file. It is not relevant on
> Windows. No relation to Perl's chomp.
The question was rhetorical. Your irony detector must be on the fritz.
:-)
> Did you actually run it on your
> Windows box?
Yes, with trepidation, after carefully reading the source. It detected
some highly plausible duplicates, which I haven't verified yet.
Cheers,
John
More information about the Python-list
mailing list