[Mailman-Developers] [PATCH] Duplicate mail avoiding in Mailman

Ben Gertzfield che@debian.org
Tue, 28 Aug 2001 22:37:37 +0900


Finally!  Here's the patch to avoid sending out list duplicates to
users who don't want it.  This is a solution to the never-ending
feuding on various Debian mailing lists about whether or not people
should Cc: folks already on the list; hopefully, this
user-configurable behavior will make everyone's lives easier.

This patch is against mailman CVS.

The patch adds a DontReceiveDuplicates flag to mm_cfg (value 128).  By
default, this is 0, which is the same as our current behavior.

If DontReceiveDuplicates is set to 1 for the user, then they will not
receive a list message if:

1) the qrunner process has already set a mail with that message-id to
   that user

-or-

2) the user is specifically listed in the To:, Cc:, Resent-To:, or Resent-Cc:
   fields.

If DontReceiveDuplicates is set to 0 and Personalization is enabled
for the list, a X-Mailman-Duplicate: yes will be added to every
duplicate message sent out (not the first) to allow users to filter
for themselves.

I have not done any i18n for this code, mea culpa.  But the option is
settable via the web interface or the mail interface. (it's the
"nodupes" option via mail.)

Issues:

1) What happens if people fake their Message-IDs?  It's possible (but
   hard) to stop real list mail from coming through to someone who
   doesn't get duplicates, if you send out lots of messages with the same
   Message-ID.

2) Does this use a lot of memory?  I'm using a two-level hash to store
   the message IDs and email addresses that have been seen; I don't know
   if this will grow huge on big sites.

3) Is there any way to add the X-Mailman-Duplicate: yes header to various
   users when the admin is not using Personalization on the lists?

4) If multiple qrunner processes end up handling mails, it's possible
   a user who doesn't want duplicates will get them.  How can we fix
   this?

5) Some users probably don't want list duplicates if the message is
   addressed to multiple lists that they belong to, but DO want
   duplicates if they're specifically listed in To:/Cc:/etc.  Should we
   make yet another option for this?

Please, comments are definitely appreciated.  The patch works for me,
as long as the previous patch I sent to the list (allowing users to
actually set options) is applied.

This and my previous patches are all available at:

http://nausicaa.interq.or.jp/mailman/

Patch follows.

diff -x CVS -x messages -ruN mailman.orig/Mailman/Cgi/options.py mailman/Mailman/Cgi/options.py
--- mailman.orig/Mailman/Cgi/options.py	Thu Aug  2 13:14:43 2001
+++ mailman/Mailman/Cgi/options.py	Tue Aug 28 22:10:21 2001
@@ -375,6 +375,7 @@
                            ('conceal',     mm_cfg.ConcealSubscription),
                            ('remind',      mm_cfg.SuppressPasswordReminder),
                            ('rcvtopic',    mm_cfg.ReceiveNonmatchingTopics),
+                           ('nodupes',     mm_cfg.DontReceiveDuplicates),
                            ):
             try:
                 newval = int(cgidata.getvalue(item))
@@ -449,9 +450,18 @@
                     global_remind = newval
                     break
 
-        if global_enable is not None or global_remind is not None:
+        global_nodupes = None
+        if cgidata.getvalue('nodupes-globally'):
+            for flag, newval in newvals:
+                if flag == mm_cfg.DontReceiveDuplicates:
+                    global_nodupes = newval
+                    break
+
+        if (global_enable is not None or global_remind is not None
+            or global_nodupes is not None):
             for gmlist in lists_of_member(mlist.host_name, user):
-                global_options(gmlist, user, global_enable, global_remind)
+                global_options(gmlist, user, global_enable, global_remind,
+                               global_nodupes)
 
         # Now print the results
         if cantdigest:
@@ -526,6 +536,10 @@
         mlist.FormatOptionButton(mm_cfg.ConcealSubscription, 0, user))
     replacements['<mm-hide-subscription-button>'] = mlist.FormatOptionButton(
         mm_cfg.ConcealSubscription, 1, user)
+    replacements['<mm-dont-receive-duplicates-button>'] = (
+        mlist.FormatOptionButton(mm_cfg.DontReceiveDuplicates, 1, user))
+    replacements['<mm-receive-duplicates-button>'] = (
+        mlist.FormatOptionButton(mm_cfg.DontReceiveDuplicates, 0, user))
     replacements['<mm-unsubscribe-button>'] = (
         mlist.FormatButton('unsub', _('Unsubscribe')) + '<br>' +
         CheckBox('unsubconfirm', 1, checked=0).Format() +
@@ -555,6 +569,8 @@
         CheckBox('deliver-globally', 1, checked=0).Format())
     replacements['<mm-global-remind-button>'] = (
         CheckBox('remind-globally', 1, checked=0).Format())
+    replacements['<mm-global-nodupes-button>'] = (
+        CheckBox('nodupes-globally', 1, checked=0).Format())
 
     days = int(mm_cfg.PENDING_REQUEST_LIFE / mm_cfg.days(1))
     if days > 1:
@@ -741,7 +757,7 @@
 
 
 
-def global_options(mlist, user, global_enable, global_remind):
+def global_options(mlist, user, global_enable, global_remind, global_nodupes):
     def sigterm_handler(signum, frame, mlist=mlist):
         # Make sure the list gets unlocked...
         mlist.Unlock()
@@ -762,6 +778,10 @@
         if global_remind is not None:
             mlist.setMemberOption(user, mm_cfg.SuppressPasswordReminder,
                                   global_remind)
+
+        if global_nodupes is not None:
+            mlist.setMemberOption(user, mm_cfg.DontReceiveDuplicates,
+                                  global_nodupes)
 
         mlist.Save()
     finally:
diff -x CVS -x messages -ruN mailman.orig/Mailman/Defaults.py.in mailman/Mailman/Defaults.py.in
--- mailman.orig/Mailman/Defaults.py.in	Tue Aug 21 00:04:21 2001
+++ mailman/Mailman/Defaults.py.in	Tue Aug 28 21:15:10 2001
@@ -269,6 +269,7 @@
     'Hold',
     'Tagger',
     'CalcRecips',
+    'AvoidDuplicates',
     'Cleanse',
     'CookHeaders',
     'ToDigest',
@@ -722,6 +723,7 @@
 ConcealSubscription = 16
 SuppressPasswordReminder = 32
 ReceiveNonmatchingTopics = 64
+DontReceiveDuplicates    = 128
 
 # Authentication contexts.
 #
diff -x CVS -x messages -ruN mailman.orig/Mailman/HTMLFormatter.py mailman/Mailman/HTMLFormatter.py
--- mailman.orig/Mailman/HTMLFormatter.py	Sat Aug 18 06:18:43 2001
+++ mailman/Mailman/HTMLFormatter.py	Tue Aug 28 21:15:31 2001
@@ -117,6 +117,7 @@
                 mm_cfg.ConcealSubscription      : 'conceal',
                 mm_cfg.SuppressPasswordReminder : 'remind',
                 mm_cfg.ReceiveNonmatchingTopics : 'rcvtopic',
+                mm_cfg.DontReceiveDuplicates	: 'nodupes',
                 }[type]
         return '<input type=radio name="%s" value="%d"%s>' % (
             name, value, checked)
diff -x CVS -x messages -ruN mailman.orig/Mailman/Handlers/AvoidDuplicates.py mailman/Mailman/Handlers/AvoidDuplicates.py
--- mailman.orig/Mailman/Handlers/AvoidDuplicates.py	Thu Jan  1 09:00:00 1970
+++ mailman/Mailman/Handlers/AvoidDuplicates.py	Tue Aug 28 22:17:46 2001
@@ -0,0 +1,118 @@
+# Copyright (C) 1998,1999,2000,2001 by the Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software 
+# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+
+"""If the user wishes it, do not send duplicates of the same message.
+
+This module keeps an in-memory dictionary of Message-ID and recipient
+pairs.  If a message with an identical Message-ID is about to be sent
+to someone who has already received a copy, we either drop the message,
+add a duplicate warning header, or pass it through, depending on the
+user's preferences.
+"""
+
+import string
+
+from Mailman import mm_cfg
+from Mailman import Utils
+from Mailman import Message
+from Mailman import Errors
+from Mailman.i18n import _
+from mimelib.address import getaddresses
+
+
+
+class DuplicateDetected(Errors.DiscardMessage):
+    """The message would have been sent multiple times to a user who
+    prefers not to receive duplicates."""
+
+# A dictionary of dictionaries, used to store which recipients have received
+# which message IDs.
+recip_msgids = {}
+
+
+
+def process(mlist, msg, msgdata):
+
+    recips = msgdata['recips']
+    msgid = msg.get('message-id')
+    
+    if not recips or not msgid:
+        return
+
+    # This dictionary will hold recips who want their mail to have
+    # the X-Mailman-Duplicate: yes header.
+    if not msgdata.has_key('add-dupe-header'):
+        msgdata['add-dupe-header'] = {}
+
+    # Happily borrowed from mimelib.getaddresses() example
+    tos = msg.getall('to')
+    ccs = msg.getall('cc')
+    resent_tos = msg.getall('resent-to')
+    resent_ccs = msg.getall('resent-cc')
+    external_recips = getaddresses(tos + ccs + resent_tos + resent_ccs)
+
+    # Anyone mentioned in the to/cc/resent-to/resent-cc headers should
+    # not get a duplicate of the message.
+    for (name, email) in external_recips:
+
+        # If getaddresses fails, email could be null. Skip those.
+        if not email:
+            continue
+        
+        # Initialize the external recipient's msgid hash if this is the
+        # first email they've received with this message-id.
+        if not recip_msgids.has_key(email):
+            recip_msgids[email] = {}
+
+        # We don't do anything except record that that address has
+        # gotten or will get a copy of this email externally.
+        recip_msgids[email][msgid] = 1
+
+    newrecips = []
+
+    for r in recips:
+        if not recip_msgids.has_key(r):
+            recip_msgids[r] = {}
+
+        # If they have received a message with this message-id already,
+        # see if they don't want duplicates.
+        if recip_msgids[r].has_key(msgid):
+            send_duplicate = 1
+            
+            # If the member wants to receive duplicates, or if the recipient 
+            # is not a member at all, just flag the X-Mailman-Duplicate: yes
+            # header.
+            try:
+                if mlist.getMemberOption(r, mm_cfg.DontReceiveDuplicates):
+                    send_duplicate = 0
+            except Errors.NotAMemberError:
+                pass
+
+            # We'll send a duplicate unless the user doesn't wish it.
+            # If personalization is enabled, the add-dupe-header flag will
+            # add a X-Mailman-Duplicate: yes header for this user's message.
+            if send_duplicate:
+                msgdata['add-dupe-header'][r] = 1
+                newrecips.append(r)
+
+        else:
+            # Otherwise, this is the first time they've been in the recips
+            # list.  Add them to the newrecips list and flag them as having
+            # received this message.
+            recip_msgids[r][msgid] = 1
+            newrecips.append(r)
+
+    msgdata['recips'] = newrecips
diff -x CVS -x messages -ruN mailman.orig/Mailman/Handlers/Personalize.py mailman/Mailman/Handlers/Personalize.py
--- mailman.orig/Mailman/Handlers/Personalize.py	Sat Aug 18 06:20:58 2001
+++ mailman/Mailman/Handlers/Personalize.py	Tue Aug 28 21:10:17 2001
@@ -46,11 +46,23 @@
             msg['To'] = '%s (%s)' % (member, name)
         else:
             msg['To'] = member
+
+        # We can flag the mail as a duplicate for each member, if
+        # they've already received that message. (See AvoidDuplicates.py)
+        if msgdata['add-dupe-header'].has_key(member):
+            msg['X-Mailman-Duplicate'] = 'yes'
+        elif msg.has_key('X-Mailman-Duplicate'):
+            del msg['X-Mailman-Duplicate']
+
         inq.enqueue(msg, metadatacopy, listname=mlist.internal_name())
 
     # Restore the original To: line
     del msg['To']
     msg['To'] = originalto
+
+    # The original message is not a duplicate.
+    if msg.has_key('X-Mailman-Duplicate'):
+        del msg['X-Mailman-Duplicate']
 
     # Don't let the normal ToOutgoing processing actually send the original
     # copy.
diff -x CVS -x messages -ruN mailman.orig/Mailman/MailCommandHandler.py mailman/Mailman/MailCommandHandler.py
--- mailman.orig/Mailman/MailCommandHandler.py	Fri Aug 17 14:37:15 2001
+++ mailman/Mailman/MailCommandHandler.py	Tue Aug 28 21:19:44 2001
@@ -80,27 +80,36 @@
 you get digests in MIME format, which are much better if you have a mail
 reader that supports MIME.""")
 
-option_desc = {'hide'    : HIDE,
-               'nomail'  : NOMAIL,
-               'ack'     : ACK,
-               'notmetoo': NOTMETOO,
-               'digest'  : DIGEST,
-               'plain'   : PLAIN,
+NODUPES = _("""When turned on, you do *not* receive duplicate mails if mail is
+sent to multiple lists that you belong to.  This option will let you avoid
+duplicate mails; if you turn it on, you will never receive multiple copies
+of the same message.  Also, if you *and* the list are mentioned explicitly
+in the To: or Cc: headers of a message, you will not receive duplicates if
+this is turned on.""")
+
+option_desc = {'hide'     : HIDE,
+               'nomail'   : NOMAIL,
+               'ack'      : ACK,
+               'notmetoo' : NOTMETOO,
+               'digest'   : DIGEST,
+               'plain'    : PLAIN,
+               'nodupes'  : NODUPES,
                }
 
 # jcrey: and then the real one
 _ = Mailman.i18n._
 
-option_info = {'hide'    : mm_cfg.ConcealSubscription,
-               'nomail'  : mm_cfg.DisableDelivery,
-               'ack'     : mm_cfg.AcknowledgePosts,
-               'notmetoo': mm_cfg.DontReceiveOwnPosts,
-               'digest'  : 0,
-               'plain'   : mm_cfg.DisableMime,
+option_info = {'hide'     : mm_cfg.ConcealSubscription,
+               'nomail'   : mm_cfg.DisableDelivery,
+               'ack'      : mm_cfg.AcknowledgePosts,
+               'notmetoo' : mm_cfg.DontReceiveOwnPosts,
+               'digest'   : 0,
+               'plain'    : mm_cfg.DisableMime,
+               'nodupes'  : mm_cfg.DontReceiveDuplicates
                }
 
 # ordered list
-options = ('hide', 'nomail', 'ack', 'notmetoo', 'digest', 'plain')
+options = ('hide', 'nomail', 'ack', 'notmetoo', 'digest', 'plain', 'nodupes')
 
 # strip just the outer layer of quotes
 quotecre = re.compile(r'["\'`](?P<cmd>.*)["\'`]')
diff -x CVS -x messages -ruN mailman.orig/templates/en/help.txt mailman/templates/en/help.txt
--- mailman.orig/templates/en/help.txt	Sat May 19 06:28:54 2001
+++ mailman/templates/en/help.txt	Tue Aug 28 21:21:56 2001
@@ -79,6 +79,10 @@
             Conceals your address when people look at who is on this
             list.
 
+        nodupes:
+            Turn this on if you do not want to receive duplicate mail
+            from the list, in case you are explicitly in the To: or Cc:
+            fields already or are included in multiple lists in one message.
 
     options
         Show the current values of your list options.
diff -x CVS -x messages -ruN mailman.orig/templates/en/options.html mailman/templates/en/options.html
--- mailman.orig/templates/en/options.html	Thu Jul 19 06:54:40 2001
+++ mailman/templates/en/options.html	Tue Aug 28 22:09:46 2001
@@ -280,6 +280,26 @@
         <mm-receive-nonmatching-topics>Yes
         </td></tr>
 
+    <tr><td bgcolor="#cccccc">
+        <strong>Avoid duplicate copies of messages?</strong><p>
+
+                When you are listed explicitly in the To: or Cc: headers
+                of a list message, or a message is sent to multiple lists
+                that you are a member of, you can opt to not receive another
+                copy from the mailing list.  Select <em>Yes</em> to avoid
+                receiving duplicate copies from the mailing list; select
+                <em>No</em> to receive duplicate copies.  
+
+                <p>If the list has per-message personalization
+                enabled, every duplicate mail will have a
+                <tt>X-Mailman-Duplicate: yes</tt> header added to it.
+
+        </td><td bgcolor="#cccccc">
+        <mm-receive-duplicates-button>No<br>
+        <mm-dont-receive-duplicates-button>Yes<p>
+        <mm-global-nodupes-button><i>Set globally</i>
+        </td></tr>
+
     <tr><TD colspan="2">
         <center><MM-options-Submit-button></center>
         </td></tr>


-- 
Brought to you by the letters M and V and the number 8.
"Hoosh is a kind of soup."
Debian GNU/Linux maintainer of Gimp and GTK+ -- http://www.debian.org/