[Mailman-Users] Mailman throughput
brad at shub-internet.org
Mon Aug 15 10:39:48 CEST 2011
On 08/15/2011 02:49 AM, Brad Knowles wrote:
> You're talking about inbound, and how you have outsourced many of these
> kinds of checks to other boxes. That's fine as far as it goes, but I was
> talking about *outbound*, from Mailman to the world of recipients.
> You are likely to have a certain number of messages coming into your
> system which will require a certain amount of processing to scan them
> for viruses and spam, etc....
> However, on outbound, you will presumably have this same number of
> messages multiplied by the number of recipients.
I just thought of an analogy that I think will be very useful here.
Input and output are two related, but very different processes -- both
for computers as well as humans. Having a pee is a different process
from drinking a beer -- related, but still different.
Generally speaking, you want to think about mixing your inputs and your
outputs -- and this gets more and more important as you scale up. A
single person who pees in the Colorado River is not going to materially
impact the water quality of the downstream communities, but if an entire
city were to dump untreated sewage into the river on an ongoing basis,
that would be a different matter.
Likewise with e-mail, what works well for you as a small site is
probably going to be something that you find doesn't necessarily work so
well as you get bigger and bigger. Mixing your inputs and outputs is
one of those factors.
For example, when processing incoming e-mail, you want to apply one set
of rules for handling viruses, but you want to apply a different set for
outbound mail. In both cases, you want to notify the internal person at
your site about the situation and let them work on how to deal with the
issue, but they are the recipient on inbound and they are the sender on
outbound -- so you can't take a simple "always notify the sender" or
"always notify the recipient" policy.
If you have performance complaints, then you have to look at where your
bottlenecks are and what those bottlenecks do to you. Eliminate the
biggest bottlenecks first, then work on the next one. If cost is a
factor, then try to find big bottlenecks that you can fix that won't
cost as much money, and keep working on eliminating those key
bottlenecks as you find whatever the new issue is. Again, mixing inputs
and outputs tends to be one of those key bottlenecks, both overall and
with regards to return-on-investment.
In the case of Mailman, we can reasonably guarantee that we follow the
GIGO principle -- Garbage In, Garbage Out. If you can keep the inbound
flow of e-mail clean, then there's nothing that Mailman does that should
make the outbound flow dirty again, so you can safely by-pass all the
checks that you would normally make at the MTA level for outbound mail
At least, as far as your local MTA is concerned, you can eliminate all
those checks. If the checks are done at your edge, then changes to your
local MTA won't have any impact on whether or not that work is done and
how much it costs you, but at least you can avoid causing unnecessary
additional load on Mailman itself.
Of course, the nature of mailing lists means that Mailman will multiply
by orders of magnitude the amount of work to be done on outbound as
compared to inbound, so if you can eliminate any of those unnecessary
checks then that will tend to be a huge win overall with regards to both
performance and monetary cost -- you won't have to devote so much money
and resources to building a larger system to handle the flow, if you can
make sure that the Mailman part of that flow is already clean and
therefore doesn't need to be re-checked.
So, the general rules are don't mix the inputs and outputs, especially
as you scale up.
Brad Knowles <brad at shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>
More information about the Mailman-Users