[Mailman-Developers] Huge lists

J C Lawrence claw@kanga.nu
Wed, 24 May 2000 18:38:09 -0700


On Wed, 24 May 2000 17:08:18 -0700 
Chuq Von Rospach <chuqui@plaidworks.com> wrote:

> At 3:41 PM -0700 5/24/2000, J C Lawrence wrote:

>> Yep, its second guessing the MTA, but its a cheap, cost
>> effective, minimal impact guess that has nearly NO punitive
>> effect on mailman itself.

> yes and no. By bunching stuff together, you help the MTA optimize,
> since it's a safe guess that it's going to (at least) sort by
> domain if it does any kind of connection caching at all.

True.  My curiosity however is what MTA's do MX sorting, and more
particularly, MX collapsing (eg for two different targets that share
an MX's among their lowest level).  The potential gains there are
likely not huge, but could be (guesstimate) noticable for high
volume servers with broad standard deviations in their target lists.

I'll have to check into that some time.  

> You could make a good argument that the best way to optimize is to
> create one mail batch per unique hostname, up to SMTP-MAX-RCPTS,
> at which point you split it into num_addrs/SMTP-MAX-RCPTS batches
> for that hostname, and then let the MTA sort if out from there.

True, this would be a useful optimisation for most of the MTA
architectures I know of.  Its also quite cheap and easy to do which
makes it even more tempting.

>> > I guess that we need a per MTA tuning/configuration document.
>> 
>> Aaaargh.  Yes.

> Definitely. Since most of the "performance" issues involve the
> MTA, and the MLM only affects it based on how it stuffs things
> into the MTA.

There gets to be a point however where it really exceeds Mailman's
charter.  Mailman is a list server, not a training course on how to
build and configure a high volume mail system.  While I don't think
we've crossed or even approached that line, In general I'd rather
spend time on Mailman than high end server considerations which are
adequately (?) documented elsewhere.

>> Without going and re-reading it, about the only thing I can think
>> to add to it would be turning off domain checking for localhost
>> RCPTs as per our recent comments if that's not there already.

> By the way, I suggest that before people *assume* this is an
> improvement that it be tested, because the domain checking has to
> be done somewhere...

I've tested it here under Exim (as of about 2 years ago).  The gains
were quite noticable for leaving it to the MTA for connection-time
resolution.  Mostly, I suspect, because Exim didn't cache (or
pre-stuff) the DNS results from the validity check for MX delivery.
Actually, I don't think Exim maintains a significant DNS cache
across delivery attempts in the first place, assuming, quite rightly
in the general case, that the local nameserver can be trusted to do
that cacheing for it.  I haven't checked this tho, as my need (I had
a 140K member list) disappeared (the company sponsoring the list
collapsed).

>> How about Postfix?  Anybody know?

> Postfix is "on the list" for later this summer for me...

I followed Postfix actively in its early days, up till about a year
after first public release when I got distracted elsewhere (I used
to publicly archive all the Postfix lists here at Kanga.Nu).  I
figure I'll probably roll everything over to Postfix sometime in a
couple months, tho I'll miss Exim's nice log analysis and queue
tools.

> Right now, I generally recommend sites doing a lot of mail-list
> traffic...

I generally recommend heartily against Sendmail for such sites.  I
just don't see it as worth the extra effort (or obscurity) when
newer MTAs such as Exim (wot I use currently), QMail or Postfix in
general offer the same or better performance and configurability
with the added benefit of human readable/auditable config files.

While its a cheap logic, its easy to note that none of the very high
volume commercial email sites out there are based on Sendmail
(Critical Path, Hotmail, Onelist, EGroups, etc).

>> Of course not.  Everybody knows that Microsoft Exchange is the
>> one true MTA and all else are but pale imitations.

> don't even JOKE about that. 

You don't know how many times I've nearly uncommented the Exim rule
that would auto-bounce (during SMTP receipt) any message with an
Exchange entry in the Received headers.  It has been tempting.

The only mail software out there that draws more ire from me is
Outlook.  Pathetic.  Absolutely pathetic.  Of course I also have a
still-commented-out procmail rule in place before Mailman that would
auto-bounce messages from Outlook, and the only reason I haven't
uncommented it is that I have too many valued list members who
cannot use anything else (corporate standards).

<sigh>

> As someone who deals with email for a living...

I should probably note at this point that I'm working for Critical
Path on their mail systems.  

> ...the only system that comes *close* to Exchange in the braindead
> category is Lotus notes. 

Sorry, entirely different orders of magnitude there.  Notes is bad,
certainly, and there few things even close to being as bad as Notes
or CC Mail (tho they've gotten a lot better in recent years (which
isn't saying much)), but Exchange/Outlook make them look positively
angelic in comparison.

> And that's not really close. I have seen so much braindamage out
> of Exchange servers I wish I could simply reject any mail that
> ever touched one....

I got some nice filters...

> You might as well drive your computers with a squirrel on a wheel.

Nope.  That's Notes.  Exchange?  Remember the dead parrot skit...?

-- 
J C Lawrence                                 Home: claw@kanga.nu
----------(*)                              Other: coder@kanga.nu
--=| A man is as sane as he is dangerous to his environment |=--