[Mailman-Developers] Limiting copies of cross posts sent

31 Jan 1999 05:54:18 +0100

[Christopher G. Petrilli]

> On Fri, Jan 29, 1999 at 02:46:59PM -0600, Jeffrey C. Ollie wrote:
> > Yes, existence of multiple SMTP daemons is a problem in that we'd have
> > to provide documentation on how  to modify the configuration for many
> > different daemons.  However doesn't this already happen?  Aren't
> > sendmail, exim, qmail, or whatever sufficiently different in their
> > implementation that setup is already problematic?
> 
> Dunno, at least with sendmail/postfix, it's just trivial additions to
> the /etc/aliases file.  I use the same ones for postfix as you would for
> sendmail, absolutely, no exceptions.  I THINK EXIM behaves the same
> way,

Once you've got the basic director and transport set up (in order to
pipe things into mailman under the right UID/GID), all you need to do
is add to an "aliases" file.

However: If we try to communicate envelope recipients to the mailman
pipe-alias command, we will have to do so in an MTA-specific way --
i.e. for the Exim case, one might do it in any of these ways:

 * Mailman can find the envelope recipient from the environment
   variables LOCAL_PART and DOMAIN.

 * Exim can invoke the Mailman pipe as

	<prefix>/mail/wrapper post test $pipe_addresses

 * Exim can add an Envelope-To: header to all locally delivered mail,
   which Mailman then could use.

None of these "solutions" would of course work for neither Sendmail,
Qmail nor Postfix.  IOW, we don't want to do that.

Besides, even if a message is delivered in a single SMTP dialog with
multiple RCPT To:s, the MTA will split the local deliveries up into
one pipe delivery per envelope recipient.  Thus Mailman _will_ need to
keep some state between deliveries.

My current ideas on limiting duplicates are:

 * Consider message headers unreliable.  They are OK for collecting
   "hints" as to what lists a message _probably_ has been crossposted
   to, but it really should be the envelope recipients that make the
   call.

 * Mailman needs some way (heuristics could probably make do for now)
   of determining whether some header address does correspond to a
   (local) mailman list or not.

 * Whenever a message subject to duplicate removal gets injected into
   Mailman, register that message for "probable pending delivery" to
   the header addresses corresponding to local Mailman lists.

 * As the MTA runs the mailman pipes for the different envelope
   recipients, Mailman updates the "probable pending delivery" status
   for that list to say "pending delivery".

 * Depending on how well we want to utilize the "multiple RCPT To:"
   SMTP performance gain, Mailman could do either of

   A. As soon as a list gets status "pending delivery", ship the
      message off to the members of that list _that hasn't already
      received it_.  Add all new addresses to some
      message.have_been_sent_to status variable.

   B. Add all members of the list to some message.to_be_delivered_to
      status variable, and wait for some time (se below) before
      starting actual delivery.

 * When either all the "probably pending delivery" addresses have been
   converted to "pending delivery" (plus, maybe, some small additional
   timeout to cater for any Bcc:ed list addresses?), or after some
   (site-configurable) timeout is reached, do the remaining pending
   deliveries, and garbage-collect the message status info.

The "I don't want to receive multiple copies" option should be an user
option, as I, for one, *would* like to receive multiple copies.

Some additional features, e.g. "Mailman shouldn't deliver list
messages to addresses already in the header" (leading to pretty decent
support for the "I want to send a message to this list _minus_ these
addresses" case :) could be nice for those that want them.

> > Another, more radical idea would be to completely replace the regular
> > SMTP daemon with a SMTP daemon written in Python that integrates
> > directly with MailMan.  I've been considering doing that for another,
> > more radical project that I have in mind.  The only problem with this is
> > that you have to dedicate a box to running MailMan and there would be
> > some performace issues, but I guess that this would be offset by the
> > ability to run MailMan on a Windows or Mac OS system.
> 
> I'm not sure this is a plus, considering I want a REALLY reliable system
> :-)  I don't have a problem with someone writing a Python MTA (I've
> thought about it many times) BUT... mailman can't use it, at least in my
> opinion.  Period.  The MTA has to deal with /etc/aliases files just like
> everyone else, it's a defacto-standard at this point.

Huh?  Are you saying that there's problems with having the "usual" MTA
deal with /etc/aliases, while the "Mailman/Python" MTA keeps its
"aliases" elsewhere?  This is the modus operandi of at least one other
pretty widespread MLM package, namely Lyris
(<URL:http://www.lyris.com/>).
-- 
Harald