[Mailman-Developers] Huge lists

J C Lawrence claw@kanga.nu
Wed, 24 May 2000 21:52:49 -0700


On Wed, 24 May 2000 20:47:21 -0700 
Chuq Von Rospach <chuqui@plaidworks.com> wrote:

> At 6:38 PM -0700 5/24/2000, J C Lawrence wrote:

>> True.  My curiosity however is what MTA's do MX sorting, and more
>> particularly, MX collapsing (eg for two different targets that
>> share an MX's among their lowest level).  The potential gains
>> there are likely not huge, but could be (guesstimate) noticable
>> for high volume servers with broad standard deviations in their
>> target lists.
>> 
>> I'll have to check into that some time.

> but -- as the experts say, the first $500 buys you 90% of the
> stereo response, and the rest of the money goes into getting you
> as close to 100% as you can get. 

Umm, true.  Looking at it again, and doing a quick check of my user
base's MXing, I suspect we're dealing with a less than 1% gain.
Bigger fish are available.  Methinks my brain was farting.

> MX sorting is definitely far up into that 90% range,
> computationally and time expensive, and lots of other stuff can be
> done first, with more gain, and less effort.

I don't believe that a list server has any business handling MX
sorting unless it is also taking responsibility for being the list
MTA.  As Mailman isn't, its a moot point.  

> Maybe one thing we need is a definition of what Mailman is and
> what it isn't. Some kind of target for the size of lists it wants
> to reasonably support. If it's 5,000 users, it doesn't matter what
> you do. If it's 50,000, or 500,000, you definitely have different
> requirements.

While I really have no say here, were I Barry and Co I'd be
comfortable with targetting Mailman as able to handle a mid/high
6digit subscriber base list on mid-range PC-class hardware given
suitable system configuration.  That wouldn't be the target of
course, just the "it must be physically able to work here" metric.

> (being able to handle a moderately busy 25,000 user list, say
> 15-30 messages a day...

Average traffic levels are never the problem.  Its the bursts you
have to worry about, especially given the enforced latency of a
moderated list and there resultant likely grouping of broadcasts.  I
usually end up moderating/approving messages in groups of 5 making
bursts of ~5K messages to the MTA (current largest list has a little
under 1K subscribers).  It is the burst aspect that's possibly the
main reason the MTA delivery process needs to be made asynchronous
from the rest of the list server.

> It'd be nice to be able to say "5 million subscribers in 2
> minutes!", but focus on a solid "do most things for most folks"
> now, and add the high performance/huge list support in 2.5. But
> leave the hooks in, so we don't have to rewrite later....)

Were delivery to the MTA seperated from the receipt or CGI process
(ie mail is received, the RCPT list attached to it, and the tuple
placed on a queue for background processing via forked process or
cron job), we wouldn't be having this discussion.  Its a fairly
invasive change to the current Mailman architecture, but making the
whole reciept/broadcast aspect asynchronous offers some really
pleasant future avenues.

>> While its a cheap logic, its easy to note that none of the very
>> high volume commercial email sites out there are based on
>> Sendmail (Critical Path, Hotmail, Onelist, EGroups, etc).

> Valid points. 

<chortle>

> Postfix looks like a *real* win, but until I run it through its
> paces, I won't use it. 

Exactly where I'm at on it.  I'm about to roll my desktops over to
it, and let it stew there for a couple weeks.

> I'm doing 400-500,000 an hour out of my mail system without trying
> too hard, using sendmail 8.9.3, and peaks approaching 900K. 

My traffic on Kanga.Nu (hobby lists: http://www.kanga.nu/lists/listinfo/) 
is bursty and low enough that I just never get any hours with
solidly active spools.  I average around 30K - 40K deliveries per
hour with the MTA sitting idle for much of that hour.  96% of
messages are delivered within 60 seconds of hitting the queue, 98.1%
within the hour -- we're basically talking a pretty idle mail
system.  

> So eeking out more performance by swapping MTAs is not a priority)

That's one of the main reasons I've been so lackadaisical about
moving to Postfix -- I don't really need to.  The only thing driving 
it is my own interest.

>>> As someone who deals with email for a living...
>> 
>> I should probably note at this point that I'm working for
>> Critical Path on their mail systems.

NB as a contractor.

> As long as we're into disclosure, I run a bunch of hobby lists at
> plaidworks.com...

Just been poking around there and noticed that your archives seem to
be inop (dead disk).  If you're interested I've been messing about
with MHonArc and PHP in my spare time and have almost finished
getting a setup that:

  -- Allows archived messages to be replied to on the web via the
     archive page (replies post to the list).

  -- Templates (PHPLIB) the entire archive appearance.  All MHonArc
     does is the parsing and data extraction.

  -- Supports archive searching by MessageID.  I've an MTA hack that
     inserts a MessageID-based URL into all outgoing Mailman
     list traffic so the user can just hit the URL and be taken to
     that message in the archives (searches the MHonArc DB, useful
     for thread reference etc).

Hopefully I'll get something worth public viewing sometime next
week.

-- 
J C Lawrence                                 Home: claw@kanga.nu
----------(*)                              Other: coder@kanga.nu
--=| A man is as sane as he is dangerous to his environment |=--