[Python Wpg] remove dup mails

Peter O'Gorman peter at pogma.com
Wed Oct 31 13:03:24 EDT 2007


I mentioned at the meeting that fetchmail went mad and downloaded my
mail messages repeatedly leaving me with multiple copies of several
hundred messages. The files were not identical, but only differed in
"Received" headers. This the the python script I came up with (it took a
while, I had to spend a good deal of time reading the docs).

I'm sure that there are better ways to do this, and would not mind a
critique, but this did work.

Thanks,
Peter

#! /usr/bin/python
import os
import sys
import email
import hashlib

dups = {}

for root, dirs, files in os.walk('/home/pogma/Maildir'):
  for fname in files:
    try:
      fobj = open(os.path.join(root,fname))
      msg = email.message_from_file(fobj)
      fobj.close()
    except:
      fobj.close()
      continue
    msg.__delitem__('Received')
    hash = hashlib.md5(msg.as_string()).hexdigest()
    if not dups.has_key(hash):
      dups[hash] = os.path.join(root,fname)
    else:
      os.unlink(os.path.join(root,fname))



More information about the Winnipeg mailing list