[Tutor] Finding duplicate files

lonetwin lonetwin <lonetwin@yahoo.com>
Thu, 16 Aug 2001 17:39:54 +0530 (IST)


Greetings everybody,
	Here's something I wrote up b'coz I felt the need for it. This little script
looks recursively for files with the same name (duplicate files), and prints the
results on the console. One reason I'm posting this here, is that maybe someone has
use for this (you'd be surprised by the amount of clutter I managed to trace in
my download directory !!! :)), but more importantly, I'd like the gurus to
comment on alternate techniques and clever tricks that might make this work
better.....I'm sure I learn some thing :D !!!

Peace
Steve

P.S.: If the gurus are too busy, for something as silly as this, it's OK !

code :
=============================================================
#!/usr/bin/python
import os

def getFileList(lst, dirname, names):    # Is used in os.path.walk()
	for x in names:
		if os.path.isfile(os.path.join(dirname, x)):
			lst.append(os.path.join(dirname, x))
	return lst

def findDup(lst):
	found = []
	for x in range(len(lst)):
		for y in lst[x+1:]:
			if os.path.basename(lst[x]) == os.path.basename(y):
				found.append('%s and %s' % (y, p[x]))
	return found

p = []
topdir = '/home/backups/Tarballs'    # The top level directory for the search
os.path.walk(topdir, getFileList, p)
found = findDup(p)
print '\n'.join([x for x in found])

=============================================================
Another P.S.:
   If nobody has any use for this I guess we all know where it belongs :)
 Rob hope this (or it's improvements) qualify.
----------------------------------------------------------------------------
	You will pay for your sins.  If you have already paid, please disregard
	this message.
----------------------------------------------------------------------------