[Mailman-Developers] Re: [Mailman-Users] Allowing users to join without specifying pas swords
Chuq Von Rospach
chuqui@plaidworks.com
Sun, 17 Jun 2001 00:46:47 -0700
On Friday, June 15, 2001, at 01:19 PM, Barry A. Warsaw wrote:
> CVR> points. but we need to quantify what those points are and
> CVR> what the impact is, so we can decide just how to move forward
> CVR> on this.
> I'd love to see any statistic you (or anybody) gathers on this
> subject. It's definitely intriguing, but right now I don't have the
> time or systems to do this kind of data gathering.
>
Okay, here's a first cut at some data.
I'm going to assume the following:
1000 subscribers -- no digest subscribers to simplify this. Assume just
individual messages.
The message size is 10K, including header.
The bandwidth needed to generate a connection to send a message is 1K
(which is pretty close)
The bandwidth needed to add an address to an existing message is about
1/10 of a K (also pretty close).
The practical limit to the number of messages you can piggyback is 100,
since this is specified in RFC2821 as the smallest number a site is
REQUIRED to take. In practice, due to non-conformant sites, you have to
be careful setting it beyond 50 these days, because sites set this
number down because they think it slows down the spammers (I'm yet to be
convinced it makes a damn bit a difference, especially since MTAs like
postifx recognize the 452 and auto-adjust now. This is another place
where sendmail seems behind the technology curve, FWIW)
How much bandwidth is used depends on these factors:
what your piggyback value is (in mailman, it's SMTP_MAX_RCPTS)
how many domains have > 1 subscriber.
Here's how plaidworks breaks down:
3101 subscribers across 1287 domains. that's an average of 2.3
subscribers per domain, but the numbers skew wildly, so averages are
meaningless.
Here's how my site breaks down:
# of subscribers # of domains/# of users
--------------------- -----------------
1 263/263
2 142/284
3 40/120
4 19/76
5 16/80
6 10/60
7 7/49
8 3/24
9 6/54
10 2/20
11 2/22
12 2/24
13 1/
14 1/
16 1/
17 1/ (worldnet.att.net)
22 1/(juno.com)
29 1 (mindspring.com)
30 1 (pacbell.net)
35 1 (plaidworks.com)
43 1 (sympatico.ca)
53 1 (earthlink.net)
150 1 (home.com)
173 1 (yahoo.com)
228 1 (hotmail.com)
441 1 (aol.com)
if you're scoring at home, 37% of subscribers come from that last 4
domains: 5% for home and yahoo, 7% for hotmail, and 14% for aol. those
are your 500 pound gorillas (AOL is 800 pounds), and piss them off at
your own risk.
At the other end, 8% of your users are the only subscriber from a
domain. 16% are 1 or 2 per domain. 26% are on sites with 5 or fewer
subscribers.
Time for some numbers.
Back to the 1000 member list for simplicity. The subscriber list breaks
down to:
85 - 1/85
45 - 2/90
12 - 3/36
6 - 4/24
[...]
48 - 1
55 - 1
73 - 1
142 - 1
That's 553, or 55% of the subscribers, wedged tightly on both ends of
the curve. We can extrapolate what they'll do to bandwidth from the end
cases if we need to.
Extreme case: SMTP_MAX_RCPTS = 1.
1000 subscribers * (10K message size + 1K overhead) = 11,000K bytes
bandwidth.
Extreme case: SMTP_MAX_RCPTS = 100
These get sent down the line this way:
85 * 11K
45 * (1 * 11K + 1 * .1K)
12 * (1 * 11K + 2 * .1K
6 * (1 * 11K + 3 * .1K)
[...]
1 * 11K + 47 * .1K
1 * 11K + 54 * .1K
1 * 11K + 72 * .1K
2 * 11K + 140 * .1K
Do you see how I got these numbers? In the case of the 12 domains with
three subscribers, you have to make an 11K connection for the first
message, and piggy back on the other two addresses at .01K each. You
don't really see huge savings until the big domains, and you'll see AOL
goes over the 100 address limit so gets split into two different
messages.
For this 55%, the SMTP=1 is 6050K. For 100, it's 1711K bytes. That's 28%
of the first number, so we're cutting 72% of the bandwidth by chunking
at 100. The tradeoff is performance, though -- it takes a lot longer to
deliver those AOL addresses, because if you split it into two batches,
you can't parallelize the delivery. Package up 100 AOL addresses in one
batch, none of them get delivered until all 100 addresses are sent to
AOL and accepted. It's much faster to send them as ten batches of ten in
parallel -- but that's the trade off here. Cut network bandwidth but
slow delivery to the larger domains.
Okay, let's look at a case in the middle. SMTP_MAX = 5. The ones with
less than 5 don't change, but the big domains do
85 * 11K
45 * (1 * 11K + 1 * .1K)
12 * (1 * 11K + 2 * .1K
6 * (1 * 11K + 3 * .1K)
[...]
1 * (10 * 11K + 38 * .1k)
1 * (11 * 11K + 44 * .1K)
1 * (15 * 11K + 58 * .1K)
1* (29 * 11K + 113 * .1K)
that works out to (trust me) about 2378K, or about a 60% reduction.
Let's try SMTP_MAX = 2.
85 * 11K
45 * (1 * 11K + 1 * .1K)
12 * (2 * 11K + 1 * .1K
6 * (2 * 11K + 2 * .1K)
[...]
1 * (10 * 11K + 38 * .1k)
1 * (11 * 11K + 44 * .1K)
1 * (15 * 11K + 58 * .1K)
1* (29 * 11K + 113 * .1K)
that works out to 2575K, or about a 57% cut.
By a rough look at those domains in the middle, I'd say these numbers
are good +-10%.
What's this mean? Here's the executive summary:
The network penalty between SMTP_MAX = 1 (effectively VERP) and any kind
of batching (SMTP > 1) is roughly 50%. To get VERP or customized footers
or customized anything, you double your network bandwidth.
There is very little advantage to setting SMTP_MAX > 5, UNLESS your
subscriber base is heavily stratified onto very few sites. If you have
really large groups of subscribers on AOL or Hotmail, it can help cut
network bandwidth, but at best, it seems to be about a 10% improvement.
If you plot the numbers I did on a curve, you can see just how little
advantage you get by increasing the number. You get almost all of the
advantage by going to 2, and the line past 5 is very flat....
Interesting -- I honestly didn't expect to see THIS big a difference --
I was expecting more like 25-30% increase in bandwidth for a VERP-type
delivery.
My thoughts on what this means to future directions:
Customized messages (VERPing, or encoded unsub URLs, or all of that...)
should definitely be an option in Mailman 2.1.
I would set Mailman's 2.1 default to have this turned ON, giving us the
customized unsub links and etc, but to document this for users so they
know to turn it off on slow networks.
If users turn it off, I recommend that SMTP_MAX be set by default to 5,
and that we document that it makes little sense to change it unless a
site is horribly network limited, because even setting to the max only
gains them another 10% (and if they're THAT network limited, they're
seriously asking for trouble anyway), and only if their subscriber base
fits a profile that lends itself to the compression. Setting it large
also leaves them open to spamblocking by systems that don't necessarily
follow the standards or act right, too.
We should ALSO note here that some MTAs (postfix, for instance) might
override SMTP_MAX anyway -- you could set it to 100, but postfix might
be configured smaller, so they have to be aware of those potential
interactions. you then get into the issues of tuning all this, with few
delivery threads with lots of addresses vs many threads in parallel..
and all that fun -- I guess I'm trying to say that you can't tune
mailman in isolation from the MTA (and down that road lies a huge
rathole of attempting to document this stuff...)
But from these numbers, any 2.0.x version of mailman should set SMTP_MAX
to between 2 and 5, unless they're horribly network limited. it makes no
sense to be larger than 5, and it makes no sense to be 1 unless you've
done some kind of VERPing patch.
for 2.1, we want to implement these customizations and default them on,
but with a 50% network hit, we definitely want to make it clear what's
going on and make it possible for them to turn it off and return to a
generic URL and non-customized e-mail.
Barry's mileage may vary on his preferences for default, of course, and
it's his show. I think the advantages of the customized URL/email
capability is a huge one and most sites will benefit from it -- but the
network hit might kill some sites, so we have to give them an easy
ability to turn the feature off.
What do y'all think? I've included mailman-developers on this reply,
since while this started on mm-users, it really ought to be discussed on
the developers list...
--
Chuq Von Rospach, Internet Gnome <http://www.chuqui.com>
[<chuqui@plaidworks.com> = <me@chuqui.com> = <chuq@apple.com>]
Yes, yes, I've finally finished my home page. Lucky you.
Yes, I am an agent of Satan, but my duties
are largely ceremonial.