[PATCH] Duplicate mail avoiding in Mailman

Finally! Here's the patch to avoid sending out list duplicates to users who don't want it. This is a solution to the never-ending feuding on various Debian mailing lists about whether or not people should Cc: folks already on the list; hopefully, this user-configurable behavior will make everyone's lives easier. This patch is against mailman CVS. The patch adds a DontReceiveDuplicates flag to mm_cfg (value 128). By default, this is 0, which is the same as our current behavior. If DontReceiveDuplicates is set to 1 for the user, then they will not receive a list message if: 1) the qrunner process has already set a mail with that message-id to that user -or- 2) the user is specifically listed in the To:, Cc:, Resent-To:, or Resent-Cc: fields. If DontReceiveDuplicates is set to 0 and Personalization is enabled for the list, a X-Mailman-Duplicate: yes will be added to every duplicate message sent out (not the first) to allow users to filter for themselves. I have not done any i18n for this code, mea culpa. But the option is settable via the web interface or the mail interface. (it's the "nodupes" option via mail.) Issues: 1) What happens if people fake their Message-IDs? It's possible (but hard) to stop real list mail from coming through to someone who doesn't get duplicates, if you send out lots of messages with the same Message-ID. 2) Does this use a lot of memory? I'm using a two-level hash to store the message IDs and email addresses that have been seen; I don't know if this will grow huge on big sites. 3) Is there any way to add the X-Mailman-Duplicate: yes header to various users when the admin is not using Personalization on the lists? 4) If multiple qrunner processes end up handling mails, it's possible a user who doesn't want duplicates will get them. How can we fix this? 5) Some users probably don't want list duplicates if the message is addressed to multiple lists that they belong to, but DO want duplicates if they're specifically listed in To:/Cc:/etc. Should we make yet another option for this? Please, comments are definitely appreciated. The patch works for me, as long as the previous patch I sent to the list (allowing users to actually set options) is applied. This and my previous patches are all available at: http://nausicaa.interq.or.jp/mailman/ Patch follows. diff -x CVS -x messages -ruN mailman.orig/Mailman/Cgi/options.py mailman/Mailman/Cgi/options.py --- mailman.orig/Mailman/Cgi/options.py Thu Aug 2 13:14:43 2001 +++ mailman/Mailman/Cgi/options.py Tue Aug 28 22:10:21 2001 @@ -375,6 +375,7 @@ ('conceal', mm_cfg.ConcealSubscription), ('remind', mm_cfg.SuppressPasswordReminder), ('rcvtopic', mm_cfg.ReceiveNonmatchingTopics), + ('nodupes', mm_cfg.DontReceiveDuplicates), ): try: newval = int(cgidata.getvalue(item)) @@ -449,9 +450,18 @@ global_remind = newval break - if global_enable is not None or global_remind is not None: + global_nodupes = None + if cgidata.getvalue('nodupes-globally'): + for flag, newval in newvals: + if flag == mm_cfg.DontReceiveDuplicates: + global_nodupes = newval + break + + if (global_enable is not None or global_remind is not None + or global_nodupes is not None): for gmlist in lists_of_member(mlist.host_name, user): - global_options(gmlist, user, global_enable, global_remind) + global_options(gmlist, user, global_enable, global_remind, + global_nodupes) # Now print the results if cantdigest: @@ -526,6 +536,10 @@ mlist.FormatOptionButton(mm_cfg.ConcealSubscription, 0, user)) replacements['<mm-hide-subscription-button>'] = mlist.FormatOptionButton( mm_cfg.ConcealSubscription, 1, user) + replacements['<mm-dont-receive-duplicates-button>'] = ( + mlist.FormatOptionButton(mm_cfg.DontReceiveDuplicates, 1, user)) + replacements['<mm-receive-duplicates-button>'] = ( + mlist.FormatOptionButton(mm_cfg.DontReceiveDuplicates, 0, user)) replacements['<mm-unsubscribe-button>'] = ( mlist.FormatButton('unsub', _('Unsubscribe')) + '<br>' + CheckBox('unsubconfirm', 1, checked=0).Format() + @@ -555,6 +569,8 @@ CheckBox('deliver-globally', 1, checked=0).Format()) replacements['<mm-global-remind-button>'] = ( CheckBox('remind-globally', 1, checked=0).Format()) + replacements['<mm-global-nodupes-button>'] = ( + CheckBox('nodupes-globally', 1, checked=0).Format()) days = int(mm_cfg.PENDING_REQUEST_LIFE / mm_cfg.days(1)) if days > 1: @@ -741,7 +757,7 @@ -def global_options(mlist, user, global_enable, global_remind): +def global_options(mlist, user, global_enable, global_remind, global_nodupes): def sigterm_handler(signum, frame, mlist=mlist): # Make sure the list gets unlocked... mlist.Unlock() @@ -762,6 +778,10 @@ if global_remind is not None: mlist.setMemberOption(user, mm_cfg.SuppressPasswordReminder, global_remind) + + if global_nodupes is not None: + mlist.setMemberOption(user, mm_cfg.DontReceiveDuplicates, + global_nodupes) mlist.Save() finally: diff -x CVS -x messages -ruN mailman.orig/Mailman/Defaults.py.in mailman/Mailman/Defaults.py.in --- mailman.orig/Mailman/Defaults.py.in Tue Aug 21 00:04:21 2001 +++ mailman/Mailman/Defaults.py.in Tue Aug 28 21:15:10 2001 @@ -269,6 +269,7 @@ 'Hold', 'Tagger', 'CalcRecips', + 'AvoidDuplicates', 'Cleanse', 'CookHeaders', 'ToDigest', @@ -722,6 +723,7 @@ ConcealSubscription = 16 SuppressPasswordReminder = 32 ReceiveNonmatchingTopics = 64 +DontReceiveDuplicates = 128 # Authentication contexts. # diff -x CVS -x messages -ruN mailman.orig/Mailman/HTMLFormatter.py mailman/Mailman/HTMLFormatter.py --- mailman.orig/Mailman/HTMLFormatter.py Sat Aug 18 06:18:43 2001 +++ mailman/Mailman/HTMLFormatter.py Tue Aug 28 21:15:31 2001 @@ -117,6 +117,7 @@ mm_cfg.ConcealSubscription : 'conceal', mm_cfg.SuppressPasswordReminder : 'remind', mm_cfg.ReceiveNonmatchingTopics : 'rcvtopic', + mm_cfg.DontReceiveDuplicates : 'nodupes', }[type] return '<input type=radio name="%s" value="%d"%s>' % ( name, value, checked) diff -x CVS -x messages -ruN mailman.orig/Mailman/Handlers/AvoidDuplicates.py mailman/Mailman/Handlers/AvoidDuplicates.py --- mailman.orig/Mailman/Handlers/AvoidDuplicates.py Thu Jan 1 09:00:00 1970 +++ mailman/Mailman/Handlers/AvoidDuplicates.py Tue Aug 28 22:17:46 2001 @@ -0,0 +1,118 @@ +# Copyright (C) 1998,1999,2000,2001 by the Free Software Foundation, Inc. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +"""If the user wishes it, do not send duplicates of the same message. + +This module keeps an in-memory dictionary of Message-ID and recipient +pairs. If a message with an identical Message-ID is about to be sent +to someone who has already received a copy, we either drop the message, +add a duplicate warning header, or pass it through, depending on the +user's preferences. +""" + +import string + +from Mailman import mm_cfg +from Mailman import Utils +from Mailman import Message +from Mailman import Errors +from Mailman.i18n import _ +from mimelib.address import getaddresses + + + +class DuplicateDetected(Errors.DiscardMessage): + """The message would have been sent multiple times to a user who + prefers not to receive duplicates.""" + +# A dictionary of dictionaries, used to store which recipients have received +# which message IDs. +recip_msgids = {} + + + +def process(mlist, msg, msgdata): + + recips = msgdata['recips'] + msgid = msg.get('message-id') + + if not recips or not msgid: + return + + # This dictionary will hold recips who want their mail to have + # the X-Mailman-Duplicate: yes header. + if not msgdata.has_key('add-dupe-header'): + msgdata['add-dupe-header'] = {} + + # Happily borrowed from mimelib.getaddresses() example + tos = msg.getall('to') + ccs = msg.getall('cc') + resent_tos = msg.getall('resent-to') + resent_ccs = msg.getall('resent-cc') + external_recips = getaddresses(tos + ccs + resent_tos + resent_ccs) + + # Anyone mentioned in the to/cc/resent-to/resent-cc headers should + # not get a duplicate of the message. + for (name, email) in external_recips: + + # If getaddresses fails, email could be null. Skip those. + if not email: + continue + + # Initialize the external recipient's msgid hash if this is the + # first email they've received with this message-id. + if not recip_msgids.has_key(email): + recip_msgids[email] = {} + + # We don't do anything except record that that address has + # gotten or will get a copy of this email externally. + recip_msgids[email][msgid] = 1 + + newrecips = [] + + for r in recips: + if not recip_msgids.has_key(r): + recip_msgids[r] = {} + + # If they have received a message with this message-id already, + # see if they don't want duplicates. + if recip_msgids[r].has_key(msgid): + send_duplicate = 1 + + # If the member wants to receive duplicates, or if the recipient + # is not a member at all, just flag the X-Mailman-Duplicate: yes + # header. + try: + if mlist.getMemberOption(r, mm_cfg.DontReceiveDuplicates): + send_duplicate = 0 + except Errors.NotAMemberError: + pass + + # We'll send a duplicate unless the user doesn't wish it. + # If personalization is enabled, the add-dupe-header flag will + # add a X-Mailman-Duplicate: yes header for this user's message. + if send_duplicate: + msgdata['add-dupe-header'][r] = 1 + newrecips.append(r) + + else: + # Otherwise, this is the first time they've been in the recips + # list. Add them to the newrecips list and flag them as having + # received this message. + recip_msgids[r][msgid] = 1 + newrecips.append(r) + + msgdata['recips'] = newrecips diff -x CVS -x messages -ruN mailman.orig/Mailman/Handlers/Personalize.py mailman/Mailman/Handlers/Personalize.py --- mailman.orig/Mailman/Handlers/Personalize.py Sat Aug 18 06:20:58 2001 +++ mailman/Mailman/Handlers/Personalize.py Tue Aug 28 21:10:17 2001 @@ -46,11 +46,23 @@ msg['To'] = '%s (%s)' % (member, name) else: msg['To'] = member + + # We can flag the mail as a duplicate for each member, if + # they've already received that message. (See AvoidDuplicates.py) + if msgdata['add-dupe-header'].has_key(member): + msg['X-Mailman-Duplicate'] = 'yes' + elif msg.has_key('X-Mailman-Duplicate'): + del msg['X-Mailman-Duplicate'] + inq.enqueue(msg, metadatacopy, listname=mlist.internal_name()) # Restore the original To: line del msg['To'] msg['To'] = originalto + + # The original message is not a duplicate. + if msg.has_key('X-Mailman-Duplicate'): + del msg['X-Mailman-Duplicate'] # Don't let the normal ToOutgoing processing actually send the original # copy. diff -x CVS -x messages -ruN mailman.orig/Mailman/MailCommandHandler.py mailman/Mailman/MailCommandHandler.py --- mailman.orig/Mailman/MailCommandHandler.py Fri Aug 17 14:37:15 2001 +++ mailman/Mailman/MailCommandHandler.py Tue Aug 28 21:19:44 2001 @@ -80,27 +80,36 @@ you get digests in MIME format, which are much better if you have a mail reader that supports MIME.""") -option_desc = {'hide' : HIDE, - 'nomail' : NOMAIL, - 'ack' : ACK, - 'notmetoo': NOTMETOO, - 'digest' : DIGEST, - 'plain' : PLAIN, +NODUPES = _("""When turned on, you do *not* receive duplicate mails if mail is +sent to multiple lists that you belong to. This option will let you avoid +duplicate mails; if you turn it on, you will never receive multiple copies +of the same message. Also, if you *and* the list are mentioned explicitly +in the To: or Cc: headers of a message, you will not receive duplicates if +this is turned on.""") + +option_desc = {'hide' : HIDE, + 'nomail' : NOMAIL, + 'ack' : ACK, + 'notmetoo' : NOTMETOO, + 'digest' : DIGEST, + 'plain' : PLAIN, + 'nodupes' : NODUPES, } # jcrey: and then the real one _ = Mailman.i18n._ -option_info = {'hide' : mm_cfg.ConcealSubscription, - 'nomail' : mm_cfg.DisableDelivery, - 'ack' : mm_cfg.AcknowledgePosts, - 'notmetoo': mm_cfg.DontReceiveOwnPosts, - 'digest' : 0, - 'plain' : mm_cfg.DisableMime, +option_info = {'hide' : mm_cfg.ConcealSubscription, + 'nomail' : mm_cfg.DisableDelivery, + 'ack' : mm_cfg.AcknowledgePosts, + 'notmetoo' : mm_cfg.DontReceiveOwnPosts, + 'digest' : 0, + 'plain' : mm_cfg.DisableMime, + 'nodupes' : mm_cfg.DontReceiveDuplicates } # ordered list -options = ('hide', 'nomail', 'ack', 'notmetoo', 'digest', 'plain') +options = ('hide', 'nomail', 'ack', 'notmetoo', 'digest', 'plain', 'nodupes') # strip just the outer layer of quotes quotecre = re.compile(r'["\'`](?P<cmd>.*)["\'`]') diff -x CVS -x messages -ruN mailman.orig/templates/en/help.txt mailman/templates/en/help.txt --- mailman.orig/templates/en/help.txt Sat May 19 06:28:54 2001 +++ mailman/templates/en/help.txt Tue Aug 28 21:21:56 2001 @@ -79,6 +79,10 @@ Conceals your address when people look at who is on this list. + nodupes: + Turn this on if you do not want to receive duplicate mail + from the list, in case you are explicitly in the To: or Cc: + fields already or are included in multiple lists in one message. options Show the current values of your list options. diff -x CVS -x messages -ruN mailman.orig/templates/en/options.html mailman/templates/en/options.html --- mailman.orig/templates/en/options.html Thu Jul 19 06:54:40 2001 +++ mailman/templates/en/options.html Tue Aug 28 22:09:46 2001 @@ -280,6 +280,26 @@ <mm-receive-nonmatching-topics>Yes </td></tr> + <tr><td bgcolor="#cccccc"> + <strong>Avoid duplicate copies of messages?</strong><p> + + When you are listed explicitly in the To: or Cc: headers + of a list message, or a message is sent to multiple lists + that you are a member of, you can opt to not receive another + copy from the mailing list. Select <em>Yes</em> to avoid + receiving duplicate copies from the mailing list; select + <em>No</em> to receive duplicate copies. + + <p>If the list has per-message personalization + enabled, every duplicate mail will have a + <tt>X-Mailman-Duplicate: yes</tt> header added to it. + + </td><td bgcolor="#cccccc"> + <mm-receive-duplicates-button>No<br> + <mm-dont-receive-duplicates-button>Yes<p> + <mm-global-nodupes-button><i>Set globally</i> + </td></tr> + <tr><TD colspan="2"> <center><MM-options-Submit-button></center> </td></tr> -- Brought to you by the letters M and V and the number 8. "Hoosh is a kind of soup." Debian GNU/Linux maintainer of Gimp and GTK+ -- http://www.debian.org/

On Tue, 28 Aug 2001, Ben Gertzfield wrote:
I've just upload to sourceforge Patch tracker some code against CVS which solves a related or similar problem.
https://sourceforge.net/tracker/index.php?func=detail&aid=457706&group_id=103&atid=300103
Provides functionality/behaviour similar to how sendmail treats its aliases file.
eg alias1: alias2, alias3, alias4 alias2: a, b alias3: b, c alias4: c, d
would only deliver one message to the union of alias2, alias3 and alias4 which is a, b, c, d
We use it at the University Computer Club at UWA for a couple of our lists. For example, we have a technical discussion list which all members can join but our systems administrators are automatically included.
This is an example of the MemberAdaptor API in 2.1. I haven't documented it at all so it's really only for those comfortable with Mailman's internals/python.
Yours Mark
Mark Tearle - mark@tearle.com "Happiness, it seems to me, consists of two things: first, in being where you belong, and second - and best - in comfortably going through everyday life, that is, having had a good nights sleep and not being hurt by new shoes" - Fontane

"Mark" == Mark Tearle <mtearle@tearle.com> writes:
Mark> Provides functionality/behaviour similar to how sendmail
Mark> treats its aliases file.
Very cool! This is actually the reason I wrote the patch in the first place -- we're replacing an 800K sendmail aliases file (no joke!) with mailman, to make user administration actually possible.
Thanks!
Ben
-- Brought to you by the letters N and E and the number 19. "A calpac is a large cap." Debian GNU/Linux maintainer of Gimp and GTK+ -- http://www.debian.org/

On Sun, 2 Sep 2001, Ben Gertzfield wrote:
Just drop UnionMemberAdaptor in MAILMANDIR/Mailman, extend.py in MAILMANDIR/lists/LISTNAME and edit appropriately.
However, it does reveal a couple of omissions in the MemberAdaptor api
- no easy way to indicate that a list is read-only, or a particular member is not editable. Also a clean way of getting parameters exposed to the web interface would be great.
Yours Mark
Mark Tearle - mark@tearle.com "Happiness, it seems to me, consists of two things: first, in being where you belong, and second - and best - in comfortably going through everyday life, that is, having had a good nights sleep and not being hurt by new shoes" - Fontane

On Tue, 28 Aug 2001, Ben Gertzfield wrote:
I've just upload to sourceforge Patch tracker some code against CVS which solves a related or similar problem.
https://sourceforge.net/tracker/index.php?func=detail&aid=457706&group_id=103&atid=300103
Provides functionality/behaviour similar to how sendmail treats its aliases file.
eg alias1: alias2, alias3, alias4 alias2: a, b alias3: b, c alias4: c, d
would only deliver one message to the union of alias2, alias3 and alias4 which is a, b, c, d
We use it at the University Computer Club at UWA for a couple of our lists. For example, we have a technical discussion list which all members can join but our systems administrators are automatically included.
This is an example of the MemberAdaptor API in 2.1. I haven't documented it at all so it's really only for those comfortable with Mailman's internals/python.
Yours Mark
Mark Tearle - mark@tearle.com "Happiness, it seems to me, consists of two things: first, in being where you belong, and second - and best - in comfortably going through everyday life, that is, having had a good nights sleep and not being hurt by new shoes" - Fontane

"Mark" == Mark Tearle <mtearle@tearle.com> writes:
Mark> Provides functionality/behaviour similar to how sendmail
Mark> treats its aliases file.
Very cool! This is actually the reason I wrote the patch in the first place -- we're replacing an 800K sendmail aliases file (no joke!) with mailman, to make user administration actually possible.
Thanks!
Ben
-- Brought to you by the letters N and E and the number 19. "A calpac is a large cap." Debian GNU/Linux maintainer of Gimp and GTK+ -- http://www.debian.org/

On Sun, 2 Sep 2001, Ben Gertzfield wrote:
Just drop UnionMemberAdaptor in MAILMANDIR/Mailman, extend.py in MAILMANDIR/lists/LISTNAME and edit appropriately.
However, it does reveal a couple of omissions in the MemberAdaptor api
- no easy way to indicate that a list is read-only, or a particular member is not editable. Also a clean way of getting parameters exposed to the web interface would be great.
Yours Mark
Mark Tearle - mark@tearle.com "Happiness, it seems to me, consists of two things: first, in being where you belong, and second - and best - in comfortably going through everyday life, that is, having had a good nights sleep and not being hurt by new shoes" - Fontane
participants (2)
-
Ben Gertzfield
-
Mark Tearle