[Mailman-Developers] Huge lists

Chuq Von Rospach chuqui@plaidworks.com
Wed, 24 May 2000 22:13:08 -0700

At 9:52 PM -0700 5/24/2000, J C Lawrence wrote:

>Umm, true.  Looking at it again, and doing a quick check of my user
>base's MXing, I suspect we're dealing with a less than 1% gain.
>Bigger fish are available.  Methinks my brain was farting.

Nope, just getting a bit ahead and thinking of something that's fun 
and technically challenging. Been there, done that... (grin).

>I don't believe that a list server has any business handling MX
>sorting unless it is also taking responsibility for being the list
>MTA.  As Mailman isn't, its a moot point.

And that's an issue I've been wrestling with a lot -- do I do a 
specialized MTA? Or do I let the MTA do its job. After going back and 
forth on this for weeks, given my current delivery rates, I've 
decided to let the MTA do its job, and wait on writing a specialized 
MTA until I need that last couple of percent of performance. Moving 
to 8.10.1 seems like an easy performance bump, postfix looks like 
it'll buy me even more, and so while doing all the MXing and stuff 
would be fun, it can wait.

>While I really have no say here, were I Barry and Co I'd be
>comfortable with targetting Mailman as able to handle a mid/high
>6digit subscriber base list on mid-range PC-class hardware given
>suitable system configuration.  That wouldn't be the target of
>course, just the "it must be physically able to work here" metric.

And I don't think that's a bad metric.

>  > (being able to handle a moderately busy 25,000 user list, say
>>  15-30 messages a day...
>Average traffic levels are never the problem.  Its the bursts you
>have to worry about, especially given the enforced latency of a
>moderated list and there resultant likely grouping of broadcasts.  I
>usually end up moderating/approving messages in groups of 5 making
>bursts of ~5K messages to the MTA (current largest list has a little
>under 1K subscribers).  It is the burst aspect that's possibly the
>main reason the MTA delivery process needs to be made asynchronous
>from the rest of the list server.

queue management is another issue. that's one place majordomo is weak 
at, because it doesn't. Everything is delivered as it comes in, so 
bursts can take a system to its knees.

Another thing to worry about... On my big system, I only do a few 
mailings a week, but they bunch together. So I've had to do a bunch 
of work on making sure the system deals with this rationally...  when 
we were doing one mailing on a given day, that was easy, but we're 
doing both a text and an HTML variant going out together, and that 
really complicates life.

>Were delivery to the MTA seperated from the receipt or CGI process
>(ie mail is received, the RCPT list attached to it, and the tuple
>placed on a queue for background processing via forked process or
>cron job), we wouldn't be having this discussion.

Well, this is probably preaching to the choir, but I've gotten quite 
convinced that you isolate every piece you can from every other 
piece, and document the interfaces. that makes it quite easy to swap 
out a new piece without affecting the rest of the system -- one of 
the huge complaints (valid!) on sendmail is it's overly monolithic, 
and therefore way too complex for its own good. The system I've been 
building the last few months is finally at the point where I can swap 
in a new subscription system without worrying about the other parts 
(did that!), or completely re-arrange the delivery back end without 
affecting other pieces. And it makes it easier to borrow code and use 
it, too... (did that, too!)

>Just been poking around there and noticed that your archives seem to
>be inop (dead disk).

I'm about 2/3 of the way through completely replacing the system, so 
the archives are on the new machine, but not released yet. Lots of 
chaos, but making progress. I hope to release an open beta of mailman 
by monday, if I can finish up some stuff (I'm replacing my list 
directory iwth a yahoo-like tool, and need to get that running, and 
write the bridging material to get it started)

>   -- Allows archived messages to be replied to on the web via the
>      archive page (replies post to the list).

Nice! does it restrict posting access to registered users or is it open?

>   -- Templates (PHPLIB) the entire archive appearance.  All MHonArc
>      does is the parsing and data extraction.

good stuff. That's what sympa does, too. It's a nice setup. MhonArc 
is quite a nice archiver. I used to use it, and then switched my web 
archives to a full forum system (web crossing) and crosslinked 
everything. that has its advantages and disadvantages.

>   -- Supports archive searching by MessageID.  I've an MTA hack that
>      inserts a MessageID-based URL into all outgoing Mailman
>      list traffic so the user can just hit the URL and be taken to
>      that message in the archives (searches the MHonArc DB, useful
>      for thread reference etc).

Interesting hack. Very interesting hack. you could do something 
really nice with PHP and MySQL, too, and do away with MHonarc, and 
parse/templatize the text on the fly. that's sort of where I'm headed 
down the road....

Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com)
Apple Mail List Gnome (mailto:chuq@apple.com)

And they sit at the bar and put bread in my jar
and say 'Man, what are you doing here?'"