[Mailman-Users] Mailman throughput

Mon Aug 15 10:39:48 CEST 2011

On 08/15/2011 02:49 AM, Brad Knowles wrote:

> You're talking about inbound, and how you have outsourced many of these
> kinds of checks to other boxes. That's fine as far as it goes, but I was
> talking about *outbound*, from Mailman to the world of recipients.
>
>
> You are likely to have a certain number of messages coming into your
> system which will require a certain amount of processing to scan them
> for viruses and spam, etc....
>
> However, on outbound, you will presumably have this same number of
> messages multiplied by the number of recipients.

I just thought of an analogy that I think will be very useful here. 
Input and output are two related, but very different processes -- both 
for computers as well as humans.  Having a pee is a different process 
from drinking a beer -- related, but still different.

Generally speaking, you want to think about mixing your inputs and your 
outputs -- and this gets more and more important as you scale up.  A 
single person who pees in the Colorado River is not going to materially 
impact the water quality of the downstream communities, but if an entire 
city were to dump untreated sewage into the river on an ongoing basis, 
that would be a different matter.

Likewise with e-mail, what works well for you as a small site is 
probably going to be something that you find doesn't necessarily work so 
well as you get bigger and bigger.  Mixing your inputs and outputs is 
one of those factors.

For example, when processing incoming e-mail, you want to apply one set 
of rules for handling viruses, but you want to apply a different set for 
outbound mail.  In both cases, you want to notify the internal person at 
your site about the situation and let them work on how to deal with the 
issue, but they are the recipient on inbound and they are the sender on 
outbound -- so you can't take a simple "always notify the sender" or 
"always notify the recipient" policy.

If you have performance complaints, then you have to look at where your 
bottlenecks are and what those bottlenecks do to you.  Eliminate the 
biggest bottlenecks first, then work on the next one.  If cost is a 
factor, then try to find big bottlenecks that you can fix that won't 
cost as much money, and keep working on eliminating those key 
bottlenecks as you find whatever the new issue is.  Again, mixing inputs 
and outputs tends to be one of those key bottlenecks, both overall and 
with regards to return-on-investment.

In the case of Mailman, we can reasonably guarantee that we follow the 
GIGO principle -- Garbage In, Garbage Out.  If you can keep the inbound 
flow of e-mail clean, then there's nothing that Mailman does that should 
make the outbound flow dirty again, so you can safely by-pass all the 
checks that you would normally make at the MTA level for outbound mail 
from Mailman.

At least, as far as your local MTA is concerned, you can eliminate all 
those checks.  If the checks are done at your edge, then changes to your 
local MTA won't have any impact on whether or not that work is done and 
how much it costs you, but at least you can avoid causing unnecessary 
additional load on Mailman itself.

Of course, the nature of mailing lists means that Mailman will multiply 
by orders of magnitude the amount of work to be done on outbound as 
compared to inbound, so if you can eliminate any of those unnecessary 
checks then that will tend to be a huge win overall with regards to both 
performance and monetary cost -- you won't have to devote so much money 
and resources to building a larger system to handle the flow, if you can 
make sure that the Mailman part of that flow is already clean and 
therefore doesn't need to be re-checked.

So, the general rules are don't mix the inputs and outputs, especially 
as you scale up.

-- 
Brad Knowles <brad at shub-internet.org>
LinkedIn Profile: <http://tinyurl.com/y8kpxu>