using python to parse md5sum list

Michael Hoffman at mh391.invalid
Sun Mar 6 10:49:29 CET 2005

Ben Rf wrote:

> I'm new to programming and i'd like to write a program that will parse
> a list produced by md5summer and give me a report in a text file on
> which md5 sums appear more than once and where they are located.

This should do the trick:

import fileinput

md5s = {}
for line in fileinput.input():
     md5, filename = line.rstrip().split()
     md5s.setdefault(md5, []).append(filename)

for md5, filenames in md5s.iteritems():
     if len(filenames) > 1:
         print "\t".join(filenames)

Put this in and you can then use [FILE]... to find duplicates in any of the files you
specify. They'll then be printed out as a tab-delimited list.

Key things you might want to look up to understand this:

* the dict datatype
* dict.setdefault()
* dict.iteritems()
* the fileinput module
Michael Hoffman

More information about the Python-list mailing list