[Mailman-Users] Filtering out duplicate emails

Chiang Wu cwu at asterpix.com
Fri Feb 26 22:59:48 CET 2010

So I'm trying to implement a custom handler in Mailman and I'm a bit stuck.
I wrote up some Python code in order to try to compare an email to a file to
try to not sent out the e-mail if the email's message matches something in
the text file. I can't seem to find any examples of custom handlers, so this
is what I came up with so far. Any help would be appreciated.

#Version 2

#Receive E-mail.
#Compare e-mail to filter file.
#Compare Content vs a file. File will have messages that should be examined
against the current email.
#If matches, remove ToOutgoing from the pipeline in the message's metadata.

import sys

from Mailman import mm_cfg
from Mailman import Errors
from Mailman.Logging.Syslog import syslog
from Mailman.Queue.sbcache import get_switchboard

# gets all the lines of the message
msgtext = str(msg)

#test to see if log file catches the message

FILE = open('log.txt', 'w')

message = []
inputline = ''
fullline = ''
infile = file('test.txt', 'r')

inputline = infile.readline()

while True:

   if len(inputline) == 0: #EOF if length = 0. Breaks out of the while loop

   else:   #read in between delimiter and other delimiter (~~) and place
into message

      if inputline == '~~\n':
         fullline = ''
         inputline = infile.readline()

         fullline += inputline
         inputline = infile.readline()


#print repr(strvalue2)

while n>=0:

   message[n]=message[n][:-1] #Removes the last new line character (\n) in
the message
   #print repr(message[n])
   #compare message[] to the email text. if matches, break and exit program
   if message[n] in msgtext:

   else: #else keep looping until end of loop and place outgoing.py after
the loop

#ToOutgoing.py portion of the code

def process(mlist, msg, msgdata):
    interval = mm_cfg.VERP_DELIVERY_INTERVAL
    # Should we VERP this message?  If personalization is enabled for this
    # list and VERP_PERSONALIZED_DELIVERIES is true, then yes we VERP it.
    # Also, if personalization is /not/ enabled, but VERP_DELIVERY_INTERVAL
    # set (and we've hit this interval), then again, this message should be
    # VERPed. Otherwise, no.
    # Note that the verp flag may already be set, e.g. by mailpasswds using
    # VERP_PASSWORD_REMINDERS.  Preserve any existing verp flag.
    if msgdata.has_key('verp'):
    elif mlist.personalize:
            msgdata['verp'] = 1
    elif interval == 0:
        # Never VERP
    elif interval == 1:
        # VERP every time
        msgdata['verp'] = 1
        # VERP every `inteval' number of times
        msgdata['verp'] = not int(mlist.post_id) % interval
    # And now drop the message in qfiles/out
    outq = get_switchboard(mm_cfg.OUTQUEUE_DIR)
    outq.enqueue(msg, msgdata, listname=mlist.internal_name())

On Thu, Feb 11, 2010 at 7:50 AM, Mark Sapiro <mark at msapiro.net> wrote:

> Chiang Wu wrote:
> >Hi. I was wondering if anyone knows a way to filter out, yet still
> archieve
> >new e-mails that have the same content as an already archived e-mail.
> You would have to write a custom handler[1] to examine the content of
> this mail and compare it to something and if it is a 'duplicate',
> remove 'ToOutgoing' from the pipeline in this messages metadata.
> Comparing it to the archive is problematic because archiving is
> asynchronous with incoming message processing, and if two 'duplicates'
> arrive close in time, the first may not be archived when you process
> the second.
> If you want to avoid truly identical content, this handler could keep a
> small database of some hash of the content and its process time for
> lookup as subsequent messages arrive. If the 'duplicate' content
> differs in things like time stamps only, you could filter those before
> hashing.
> [1] <http://wiki.list.org/x/l4A9>
> --
> Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
> San Francisco Bay Area, California    better use your sense - B. Dylan

