Howto find same files?

Erik Max Francis max at alcyone.com
Sat Oct 28 18:33:48 EDT 2000


gregoire.favre at ima.unil.ch wrote:

> Would it be a good idea to create a files which contains the
> path,filename,size,md5sum and then working on it?

My utility which eliminates duplicate files (written in Python) simply
keeps 1. the filename it first saw the file as (for reference), 2. a tag
which can be used to indicate in what group it was seen (say, a date),
3. the file size, and 4. a 32-bit CRC.  Since you're evidently already
using a UNIX-like system, you can get the CRC for free:  `cksum'.  (This
also gives you the file size without any extra work, which is helpful
not so much for uniquely identifying the file, but also for being able
to determine the size of your processed collection from the databaes
alone.)

It's really quite straightforward to write one; what's giving you
trouble?

-- 
 Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
 __ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/  \ That which is resisted persists.
\__/ Camden Benares
    Esperanto reference / http://mirror/alcyone/max/lang/esperanto/
 An Esperanto reference for English speakers.



More information about the Python-list mailing list