[Mailman-Users] Mailman throughput

Mon Aug 15 19:23:01 CEST 2011

Hi Brad,

On Aug 15, 2011, at 1:49 AM, Brad Knowles wrote:

On 08/14/2011 11:24 PM, Ivan Fetch wrote:

Brad, I think we are already accomplishing a lot of this minimalism,
since the MTA on the Mailman VM is only accepting the message via SMTP,
then handing it off to Mailman via the Postfix aliases. The spam and
other checks are done before hand, by another upstream gateway MTA. That
gateway then hands mailing list messages off to the Mailman box.

You're talking about inbound, and how you have outsourced many of these
kinds of checks to other boxes.  That's fine as far as it goes, but I
was talking about *outbound*, from Mailman to the world of recipients.

You are likely to have a certain number of messages coming into your
system which will require a certain amount of processing to scan them
for viruses and spam, etc....

However, on outbound, you will presumably have this same number of
messages multiplied by the number of recipients.

If that's an average of ten recipients per list, then you have a factor
of ten increase in the amount of work done to scan those messages for
viruses and spam -- and since all those messages are largely identical
in those regards, that's all wasted work, and therefore that's all work
that you want to avoid to the greatest degree possible.

As you scale up to thousands, tens of thousands, hundreds of thousands,
etc... numbers of recipients, the more work you can avoid doing on the
outbound side, the better.

OK - now we're on the same page. :) The MTA which Mailman relays to, does not repeat processes like virus / spam scanning. We are re-working our gateways and relays over the next few months, to further separate out these roles. E.G. Quarantine of spam will be handled before a message hits Mailman, not after the message has been exploded to list subscribers.

This is true for subscribers which are not part of our organization
-  the MTA which Mailman relays to accepts the messages, and then deals
with any delivery issues. However, accounts for which this MTA is the
final destination, will tempfail under certain conditions, like
mismatched attributes in an LDAP record, or an issue with the mailstore.

And those are precisely the circumstances under which the MTA should not
be handing a tempfail condition back to Mailman.  It should go ahead and
blindly accept those messages and accept responsibility for them, and
then it should deal with those tempfail cases internally.

We are definitely moving to this (MTA will accept what ever Mailman gives it). For the next few months, we will have some local accounts tempfailing, until we get off of Sun IMS or JSMS or what ever the product is named today. Part of why the relayis tempfailing, is because we hapen to be using a relay which is also a mailstore.

Mailman is really, really bad at handling large queues for all the same
reasons that MTAs from twenty years ago were bad at handling large
queues -- they're largely single threaded, disk bound, and use a single
outbound directory for all file locking and message queueing, which
means that they are absolutely decimated when it comes to having to scan
a linear linked list on disk when trying to store the next file or pull
up the next file.

Modern MTAs are fully multi-threaded, they keep their active queue in
memory as opposed to putting them on disk, and they hash the disk queues
for inactive messages over a large distributed set of directories so if
one process is working on the files in a given directory then the odds
are vanishingly small that any other process would be blocked waiting on
the lock for that directory.

AH, good to know RE: Mailman queueing. SO, the only reason why things should be in qfiles/retry, woudl be something like a relay being unavailable.

For better or worse, we are moving a lot of our mailboxes to mail
forwards over the next few months - this will move the rest of these
tempfails out of Mailman's SMTP / retry queue, and into the downstream
relay (where they belong).

From Mailman's perspective, your local MTA *IS* the downstream relay,
and it should not be causing these kinds of loads to be put on Mailman.

Pull as much of the queueing as possible out of Mailman and put it into
your local MTA.  From there, it becomes an MTA problem, and it doesn't
matter to Mailman whether the mailboxes are local or remote.

WHen you say "local MTA" you don't mean strictly local to the Mailman box right? I believe you mean local as in a separate relay box.

I say all this as a specialist in designing and building large-scale
mail systems (such as AOL), a long-term member of the Mailman project,
and a member of the postmaster team for python.org<http://python.org> where all the
official Mailman mailing lists are hosted -- using Mailman.

Thanks Brad, for your time on this, and your later analogy RE: input and output.

- Ivan

.