[Mailman-Users] Mailman throughput

Brad Knowles brad at shub-internet.org
Mon Aug 15 09:49:33 CEST 2011


On 08/14/2011 11:24 PM, Ivan Fetch wrote:

> Brad, I think we are already accomplishing a lot of this minimalism,
> since the MTA on the Mailman VM is only accepting the message via SMTP,
> then handing it off to Mailman via the Postfix aliases. The spam and
> other checks are done before hand, by another upstream gateway MTA. That
> gateway then hands mailing list messages off to the Mailman box.

You're talking about inbound, and how you have outsourced many of these 
kinds of checks to other boxes.  That's fine as far as it goes, but I 
was talking about *outbound*, from Mailman to the world of recipients.


You are likely to have a certain number of messages coming into your 
system which will require a certain amount of processing to scan them 
for viruses and spam, etc....

However, on outbound, you will presumably have this same number of 
messages multiplied by the number of recipients.

If that's an average of ten recipients per list, then you have a factor 
of ten increase in the amount of work done to scan those messages for 
viruses and spam -- and since all those messages are largely identical 
in those regards, that's all wasted work, and therefore that's all work 
that you want to avoid to the greatest degree possible.

As you scale up to thousands, tens of thousands, hundreds of thousands, 
etc... numbers of recipients, the more work you can avoid doing on the 
outbound side, the better.

> This is true for subscribers which are not part of our organization
> -  the MTA which Mailman relays to accepts the messages, and then deals
> with any delivery issues. However, accounts for which this MTA is the
> final destination, will tempfail under certain conditions, like
> mismatched attributes in an LDAP record, or an issue with the mailstore.

And those are precisely the circumstances under which the MTA should not 
be handing a tempfail condition back to Mailman.  It should go ahead and 
blindly accept those messages and accept responsibility for them, and 
then it should deal with those tempfail cases internally.

Mailman is really, really bad at handling large queues for all the same 
reasons that MTAs from twenty years ago were bad at handling large 
queues -- they're largely single threaded, disk bound, and use a single 
outbound directory for all file locking and message queueing, which 
means that they are absolutely decimated when it comes to having to scan 
a linear linked list on disk when trying to store the next file or pull 
up the next file.

Modern MTAs are fully multi-threaded, they keep their active queue in 
memory as opposed to putting them on disk, and they hash the disk queues 
for inactive messages over a large distributed set of directories so if 
one process is working on the files in a given directory then the odds 
are vanishingly small that any other process would be blocked waiting on 
the lock for that directory.


You wouldn't put a Model-T Ford into a Formula-1 race today, and 
likewise you should not be depending on ancient queueing methods as your 
bottleneck for handling all your outgoing mail.

Or, if you have no choice but to depend on them at all, then you should 
minimize your dependence on them as much as you possibly can.

> For better or worse, we are moving a lot of our mailboxes to mail
> forwards over the next few months - this will move the rest of these
> tempfails out of Mailman's SMTP / retry queue, and into the downstream
> relay (where they belong).

 From Mailman's perspective, your local MTA *IS* the downstream relay, 
and it should not be causing these kinds of loads to be put on Mailman.

Pull as much of the queueing as possible out of Mailman and put it into 
your local MTA.  From there, it becomes an MTA problem, and it doesn't 
matter to Mailman whether the mailboxes are local or remote.


I say all this as a specialist in designing and building large-scale 
mail systems (such as AOL), a long-term member of the Mailman project, 
and a member of the postmaster team for python.org where all the 
official Mailman mailing lists are hosted -- using Mailman.

-- 
Brad Knowles <brad at shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>


More information about the Mailman-Users mailing list