[Mailman-Users] Filtering out duplicate emails
Chiang Wu
cwu at asterpix.com
Fri Feb 26 22:59:48 CET 2010
So I'm trying to implement a custom handler in Mailman and I'm a bit stuck.
I wrote up some Python code in order to try to compare an email to a file to
try to not sent out the e-mail if the email's message matches something in
the text file. I can't seem to find any examples of custom handlers, so this
is what I came up with so far. Any help would be appreciated.
#pythonhandlerv2.py
#Version 2
#Receive E-mail.
#Compare e-mail to filter file.
#Compare Content vs a file. File will have messages that should be examined
against the current email.
#If matches, remove ToOutgoing from the pipeline in the message's metadata.
import sys
from Mailman import mm_cfg
from Mailman import Errors
from Mailman.Logging.Syslog import syslog
from Mailman.Queue.sbcache import get_switchboard
# gets all the lines of the message
msgtext = str(msg)
#test to see if log file catches the message
FILE = open('log.txt', 'w')
FILE.writelines(msgtxt)
message = []
n=0
inputline = ''
fullline = ''
infile = file('test.txt', 'r')
inputline = infile.readline()
while True:
if len(inputline) == 0: #EOF if length = 0. Breaks out of the while loop
break
else: #read in between delimiter and other delimiter (~~) and place
into message
if inputline == '~~\n':
message.append(fullline)
fullline = ''
n+=1
inputline = infile.readline()
else:
fullline += inputline
inputline = infile.readline()
n-=1
#print repr(strvalue2)
while n>=0:
message[n]=message[n][:-1] #Removes the last new line character (\n) in
the message
#print repr(message[n])
#compare message[] to the email text. if matches, break and exit program
if message[n] in msgtext:
sys.exit(0)
else: #else keep looping until end of loop and place outgoing.py after
the loop
n-=1
#ToOutgoing.py portion of the code
def process(mlist, msg, msgdata):
interval = mm_cfg.VERP_DELIVERY_INTERVAL
# Should we VERP this message? If personalization is enabled for this
# list and VERP_PERSONALIZED_DELIVERIES is true, then yes we VERP it.
# Also, if personalization is /not/ enabled, but VERP_DELIVERY_INTERVAL
is
# set (and we've hit this interval), then again, this message should be
# VERPed. Otherwise, no.
#
# Note that the verp flag may already be set, e.g. by mailpasswds using
# VERP_PASSWORD_REMINDERS. Preserve any existing verp flag.
if msgdata.has_key('verp'):
pass
elif mlist.personalize:
if mm_cfg.VERP_PERSONALIZED_DELIVERIES:
msgdata['verp'] = 1
elif interval == 0:
# Never VERP
pass
elif interval == 1:
# VERP every time
msgdata['verp'] = 1
else:
# VERP every `inteval' number of times
msgdata['verp'] = not int(mlist.post_id) % interval
# And now drop the message in qfiles/out
outq = get_switchboard(mm_cfg.OUTQUEUE_DIR)
outq.enqueue(msg, msgdata, listname=mlist.internal_name())
On Thu, Feb 11, 2010 at 7:50 AM, Mark Sapiro <mark at msapiro.net> wrote:
> Chiang Wu wrote:
>
> >Hi. I was wondering if anyone knows a way to filter out, yet still
> archieve
> >new e-mails that have the same content as an already archived e-mail.
>
>
> You would have to write a custom handler[1] to examine the content of
> this mail and compare it to something and if it is a 'duplicate',
> remove 'ToOutgoing' from the pipeline in this messages metadata.
>
> Comparing it to the archive is problematic because archiving is
> asynchronous with incoming message processing, and if two 'duplicates'
> arrive close in time, the first may not be archived when you process
> the second.
>
> If you want to avoid truly identical content, this handler could keep a
> small database of some hash of the content and its process time for
> lookup as subsequent messages arrive. If the 'duplicate' content
> differs in things like time stamps only, you could filter those before
> hashing.
>
>
>
> [1] <http://wiki.list.org/x/l4A9>
>
> --
> Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
> San Francisco Bay Area, California better use your sense - B. Dylan
>
>
More information about the Mailman-Users
mailing list