[Python Wpg] remove dup mails
Peter O'Gorman
peter at pogma.com
Wed Oct 31 13:03:24 EDT 2007
I mentioned at the meeting that fetchmail went mad and downloaded my
mail messages repeatedly leaving me with multiple copies of several
hundred messages. The files were not identical, but only differed in
"Received" headers. This the the python script I came up with (it took a
while, I had to spend a good deal of time reading the docs).
I'm sure that there are better ways to do this, and would not mind a
critique, but this did work.
Thanks,
Peter
#! /usr/bin/python
import os
import sys
import email
import hashlib
dups = {}
for root, dirs, files in os.walk('/home/pogma/Maildir'):
for fname in files:
try:
fobj = open(os.path.join(root,fname))
msg = email.message_from_file(fobj)
fobj.close()
except:
fobj.close()
continue
msg.__delitem__('Received')
hash = hashlib.md5(msg.as_string()).hexdigest()
if not dups.has_key(hash):
dups[hash] = os.path.join(root,fname)
else:
os.unlink(os.path.join(root,fname))
More information about the Winnipeg
mailing list