using python to parse md5sum list

James Stroud jstroud at mbi.ucla.edu
Sat Mar 5 23:19:57 EST 2005


Among many other things:

First, you might want to look at os.path.walk()
Second, look at the string data type.

Third, get the Python essential reference.

Also, Programming Python (O'Riely) actually has a lot in it about stuff like 
this. Its a tedious read, but in the end will help a lot for administrative 
stuff like you are doing here.

So, with the understanding that you will look at these references, I will 
foolishly save you a little time...

If you are using md5sum, tou can grab the md5 and the filename like such:

myfile = open(filename)
md5sums = []
for aline in myfile.readlines():
  md5sums.append(aline[:-1].split("  ",1))
myfile.close()

The md5 sum will be in the 0 element of each tuple in the md5sums list, and 
the path to the file will be in the 1 element.


James

On Saturday 05 March 2005 07:54 pm, Ben Rf wrote:
> Hi
>
> I'm new to programming and i'd like to write a program that will parse
> a list produced by md5summer and give me a report in a text file on
> which md5 sums appear more than once and where they are located.
>
> the end end goal is to have a way of finding duplicate files that are
> scattered across a lan of 4 windows computers.
>
> I've dabbled with different languages over the years and i think
> python is a good language for this but i have had a lot of trouble
> sifting through manual and tutorials finding out with commands i need
> and their syntax.
>
> Can someone please help me?
>
> Thanks.
>
> Ben

-- 
James Stroud, Ph.D.
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095



More information about the Python-list mailing list