[Mailman-Users] Mailman throughput
Brad Knowles
brad at shub-internet.org
Mon Aug 15 09:49:33 CEST 2011
On 08/14/2011 11:24 PM, Ivan Fetch wrote:
> Brad, I think we are already accomplishing a lot of this minimalism,
> since the MTA on the Mailman VM is only accepting the message via SMTP,
> then handing it off to Mailman via the Postfix aliases. The spam and
> other checks are done before hand, by another upstream gateway MTA. That
> gateway then hands mailing list messages off to the Mailman box.
You're talking about inbound, and how you have outsourced many of these
kinds of checks to other boxes. That's fine as far as it goes, but I
was talking about *outbound*, from Mailman to the world of recipients.
You are likely to have a certain number of messages coming into your
system which will require a certain amount of processing to scan them
for viruses and spam, etc....
However, on outbound, you will presumably have this same number of
messages multiplied by the number of recipients.
If that's an average of ten recipients per list, then you have a factor
of ten increase in the amount of work done to scan those messages for
viruses and spam -- and since all those messages are largely identical
in those regards, that's all wasted work, and therefore that's all work
that you want to avoid to the greatest degree possible.
As you scale up to thousands, tens of thousands, hundreds of thousands,
etc... numbers of recipients, the more work you can avoid doing on the
outbound side, the better.
> This is true for subscribers which are not part of our organization
> - the MTA which Mailman relays to accepts the messages, and then deals
> with any delivery issues. However, accounts for which this MTA is the
> final destination, will tempfail under certain conditions, like
> mismatched attributes in an LDAP record, or an issue with the mailstore.
And those are precisely the circumstances under which the MTA should not
be handing a tempfail condition back to Mailman. It should go ahead and
blindly accept those messages and accept responsibility for them, and
then it should deal with those tempfail cases internally.
Mailman is really, really bad at handling large queues for all the same
reasons that MTAs from twenty years ago were bad at handling large
queues -- they're largely single threaded, disk bound, and use a single
outbound directory for all file locking and message queueing, which
means that they are absolutely decimated when it comes to having to scan
a linear linked list on disk when trying to store the next file or pull
up the next file.
Modern MTAs are fully multi-threaded, they keep their active queue in
memory as opposed to putting them on disk, and they hash the disk queues
for inactive messages over a large distributed set of directories so if
one process is working on the files in a given directory then the odds
are vanishingly small that any other process would be blocked waiting on
the lock for that directory.
You wouldn't put a Model-T Ford into a Formula-1 race today, and
likewise you should not be depending on ancient queueing methods as your
bottleneck for handling all your outgoing mail.
Or, if you have no choice but to depend on them at all, then you should
minimize your dependence on them as much as you possibly can.
> For better or worse, we are moving a lot of our mailboxes to mail
> forwards over the next few months - this will move the rest of these
> tempfails out of Mailman's SMTP / retry queue, and into the downstream
> relay (where they belong).
From Mailman's perspective, your local MTA *IS* the downstream relay,
and it should not be causing these kinds of loads to be put on Mailman.
Pull as much of the queueing as possible out of Mailman and put it into
your local MTA. From there, it becomes an MTA problem, and it doesn't
matter to Mailman whether the mailboxes are local or remote.
I say all this as a specialist in designing and building large-scale
mail systems (such as AOL), a long-term member of the Mailman project,
and a member of the postmaster team for python.org where all the
official Mailman mailing lists are hosted -- using Mailman.
--
Brad Knowles <brad at shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>
More information about the Mailman-Users
mailing list