[Mailman-Developers] Huge lists

Chuq Von Rospach chuqui@plaidworks.com
Thu, 25 May 2000 11:05:57 -0700

At 10:32 AM +0100 5/25/2000, Nigel Metheringham wrote:
>Wietse had some figures on MTA performance analysis which he used as
>part of the design process for Postfix.  He concluded that disk I/O was
>*the* limiting factor for an MTA

That matches my experience. I've had good results using RAM disks to 
minimize this, which is sort of cheating, but worth it. And much of 
the structural changes to Sendmail 8.10 are aimed at reducing this 
impact, so they've finally figured this out, too. m

>If we have a million user list... and a message of a few K, I'm not
>sure I want to have a few GB of queue space taken up.

That's one reason in favor of having the MLM monitor MTA backlogs and 
throttle itself. On my systems, I try to tune things so that I give 
the MTA enough to chew on and get up to speed, but not so much that 
it starts thrashing trying to deal with queue overhead issues. I'm 
hoping sendmail 8.10 and down the road postfix allow me to no longer 
worry about this (or worry about it less...). But for large lists, 
it's another issue.

>Having said that I *really* would like the possibility of the
>occaisional message (maybe even just the password reminders.. although
>I'd prefer a method where some messages if the list was in a state
>where it has recently seen bounces that it cannot tie to a particular
>subscriber) be sent out using VERP.

It depends on the type of list. If you're a busy list, VERPing every 
message could well be overkill, but again, it gets back to being able 
to do other things as well, like pre-loading addresses into the 
unsubscribe URL. there's some nice user-interface improvements you 
can make once you have VERP to make the whole user experience much 
less painful...

But if you're an announce list that only comes out occasionally, 
sending out a monthly "here's your password" update on a list that 
averages 2 messages a month seems to be the wrong thing, at least to 
me. Because the hassle factor of the noise generated by the 
administrative postings starts to overwhelm the content. your users 
won't like that (this brings up a sub-discussion, that of the 
"monthly reminder message", but we won't go there now... )

Again, I think we have to remember that there are lots of different 
USES for mail lists, and different usage forms for those types. How 
you handle a twice-a-month enewsletter is going to be much different 
than a 40 message a day discussion list. So there's no single right 
answer, and configurable options to support the different flavors is 
a really Good Thing....

>  However then we also need to
>recode the MTA incoming handling to take that - aliases don't cut it
>any more.

I was thinking last night that what would REALLY, REALLY be useful 
here is an extended SMTP protocol that allows the VERPing to be 
introduced by the receiving SMTP server, rather than the delivery 
server or MLM. And after thinking about it, I went and laid down in a 
dark room until I got over it... (snicker). But if you think about 
it, the downside to VERP is you lose the efficiency of batching 
multiple addresses into a single transaction, so the solution is to 
extend SMTP to allow us to maintain that effeciency while building in 
the VERPing data at time of delivery...

And if you think that's feasible, you need to lie down in a dark 
room... But it's an intriguing thought....

>Counter examples are always problems....  The biggest UK ISP group
>(several "virtual" ISPs use the same bulk ISP service set) has a few
>million users each of whom have their own domain name - so you will
>find that *.freeserve.co.uk (around 2 million domains) all goes to the
>same batch of MXes.

Yeah. And until I realized that you need to worry about domain names 
out to the fourth sub-domain, it was driving my database stuff crazy, 
because lookups sludged out badly. Here in the states, you get used 
to worrying about 2nd level domains, and maybe third, but when your 
audience internationalizes, the rules change... I now track domain 
name uniqueness out to the fourth part, just to handle places like 
freeserve cleanly. Otherwise, all heck breaks loose.

>[On per-MTA documentation]
>Lets start bullying^Wpersuading people to contribute some documentation
>on this stuff or pointers to existing MTA documentation that addresses

As I find stuff out, I'll definitely make it available, and should be 
able to at least help collate.

>Big lists are a different issue - you need to *choose* your MTA and
>hardware within your constraints for that.   Tuning is probably a
>consultancy job for those.

I wouldn't worry too much about big stuff, either. And to be honest, 
once I catch up on some other stuff, I'll be setting up a mail list 
and other resources for big list admins.

>That particular one you mention should be blocked from the net -
>presumably their upstream is clueless too.

Yup. They effectively ARE blocked from all of my sites. One is in 
italy, for instance, although another that I've had problems with is 
a school site down in Scottsdale. the problem seems to be sites that 
are running really downrev versions of things that nobody's watching 
or upgrading.

Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com)
Apple Mail List Gnome (mailto:chuq@apple.com)

And they sit at the bar and put bread in my jar
and say 'Man, what are you doing here?'"