Re: [Mailman-Developers] Huge lists

On Wed, 24 May 2000 20:47:21 -0700 Chuq Von Rospach <chuqui@plaidworks.com> wrote:
At 6:38 PM -0700 5/24/2000, J C Lawrence wrote:
Umm, true. Looking at it again, and doing a quick check of my user base's MXing, I suspect we're dealing with a less than 1% gain. Bigger fish are available. Methinks my brain was farting.
I don't believe that a list server has any business handling MX sorting unless it is also taking responsibility for being the list MTA. As Mailman isn't, its a moot point.
While I really have no say here, were I Barry and Co I'd be comfortable with targetting Mailman as able to handle a mid/high 6digit subscriber base list on mid-range PC-class hardware given suitable system configuration. That wouldn't be the target of course, just the "it must be physically able to work here" metric.
(being able to handle a moderately busy 25,000 user list, say 15-30 messages a day...
Average traffic levels are never the problem. Its the bursts you have to worry about, especially given the enforced latency of a moderated list and there resultant likely grouping of broadcasts. I usually end up moderating/approving messages in groups of 5 making bursts of ~5K messages to the MTA (current largest list has a little under 1K subscribers). It is the burst aspect that's possibly the main reason the MTA delivery process needs to be made asynchronous from the rest of the list server.
Were delivery to the MTA seperated from the receipt or CGI process (ie mail is received, the RCPT list attached to it, and the tuple placed on a queue for background processing via forked process or cron job), we wouldn't be having this discussion. Its a fairly invasive change to the current Mailman architecture, but making the whole reciept/broadcast aspect asynchronous offers some really pleasant future avenues.
Valid points.
<chortle>
Postfix looks like a *real* win, but until I run it through its paces, I won't use it.
Exactly where I'm at on it. I'm about to roll my desktops over to it, and let it stew there for a couple weeks.
I'm doing 400-500,000 an hour out of my mail system without trying too hard, using sendmail 8.9.3, and peaks approaching 900K.
My traffic on Kanga.Nu (hobby lists: http://www.kanga.nu/lists/listinfo/) is bursty and low enough that I just never get any hours with solidly active spools. I average around 30K - 40K deliveries per hour with the MTA sitting idle for much of that hour. 96% of messages are delivered within 60 seconds of hitting the queue, 98.1% within the hour -- we're basically talking a pretty idle mail system.
So eeking out more performance by swapping MTAs is not a priority)
That's one of the main reasons I've been so lackadaisical about moving to Postfix -- I don't really need to. The only thing driving it is my own interest.
NB as a contractor.
As long as we're into disclosure, I run a bunch of hobby lists at plaidworks.com...
Just been poking around there and noticed that your archives seem to be inop (dead disk). If you're interested I've been messing about with MHonArc and PHP in my spare time and have almost finished getting a setup that:
-- Allows archived messages to be replied to on the web via the archive page (replies post to the list).
-- Templates (PHPLIB) the entire archive appearance. All MHonArc does is the parsing and data extraction.
-- Supports archive searching by MessageID. I've an MTA hack that inserts a MessageID-based URL into all outgoing Mailman list traffic so the user can just hit the URL and be taken to that message in the archives (searches the MHonArc DB, useful for thread reference etc).
Hopefully I'll get something worth public viewing sometime next week.
-- J C Lawrence Home: claw@kanga.nu ----------(*) Other: coder@kanga.nu --=| A man is as sane as he is dangerous to his environment |=--

At 9:52 PM -0700 5/24/2000, J C Lawrence wrote:
Nope, just getting a bit ahead and thinking of something that's fun and technically challenging. Been there, done that... (grin).
And that's an issue I've been wrestling with a lot -- do I do a specialized MTA? Or do I let the MTA do its job. After going back and forth on this for weeks, given my current delivery rates, I've decided to let the MTA do its job, and wait on writing a specialized MTA until I need that last couple of percent of performance. Moving to 8.10.1 seems like an easy performance bump, postfix looks like it'll buy me even more, and so while doing all the MXing and stuff would be fun, it can wait.
And I don't think that's a bad metric.
queue management is another issue. that's one place majordomo is weak at, because it doesn't. Everything is delivered as it comes in, so bursts can take a system to its knees.
Another thing to worry about... On my big system, I only do a few mailings a week, but they bunch together. So I've had to do a bunch of work on making sure the system deals with this rationally... when we were doing one mailing on a given day, that was easy, but we're doing both a text and an HTML variant going out together, and that really complicates life.
Well, this is probably preaching to the choir, but I've gotten quite convinced that you isolate every piece you can from every other piece, and document the interfaces. that makes it quite easy to swap out a new piece without affecting the rest of the system -- one of the huge complaints (valid!) on sendmail is it's overly monolithic, and therefore way too complex for its own good. The system I've been building the last few months is finally at the point where I can swap in a new subscription system without worrying about the other parts (did that!), or completely re-arrange the delivery back end without affecting other pieces. And it makes it easier to borrow code and use it, too... (did that, too!)
Just been poking around there and noticed that your archives seem to be inop (dead disk).
I'm about 2/3 of the way through completely replacing the system, so the archives are on the new machine, but not released yet. Lots of chaos, but making progress. I hope to release an open beta of mailman by monday, if I can finish up some stuff (I'm replacing my list directory iwth a yahoo-like tool, and need to get that running, and write the bridging material to get it started)
-- Allows archived messages to be replied to on the web via the archive page (replies post to the list).
Nice! does it restrict posting access to registered users or is it open?
-- Templates (PHPLIB) the entire archive appearance. All MHonArc does is the parsing and data extraction.
good stuff. That's what sympa does, too. It's a nice setup. MhonArc is quite a nice archiver. I used to use it, and then switched my web archives to a full forum system (web crossing) and crosslinked everything. that has its advantages and disadvantages.
Interesting hack. Very interesting hack. you could do something really nice with PHP and MySQL, too, and do away with MHonarc, and parse/templatize the text on the fly. that's sort of where I'm headed down the road....
-- Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com) Apple Mail List Gnome (mailto:chuq@apple.com)
And they sit at the bar and put bread in my jar and say 'Man, what are you doing here?'"

"JCL" == J C Lawrence <claw@kanga.nu> writes:
JCL> While I really have no say here, were I Barry and Co I'd be
JCL> comfortable with targetting Mailman as able to handle a
JCL> mid/high 6digit subscriber base list on mid-range PC-class
JCL> hardware given suitable system configuration. That wouldn't
JCL> be the target of course, just the "it must be physically able
JCL> to work here" metric.
Bingo.
JCL> Were delivery to the MTA seperated from the receipt or CGI
JCL> process (ie mail is received, the RCPT list attached to it,
JCL> and the tuple placed on a queue for background processing via
JCL> forked process or cron job), we wouldn't be having this
JCL> discussion. Its a fairly invasive change to the current
JCL> Mailman architecture, but making the whole reciept/broadcast
JCL> aspect asynchronous offers some really pleasant future
JCL> avenues.
Yes.
JCL> Just been poking around there and noticed that your archives
JCL> seem to be inop (dead disk). If you're interested I've been
JCL> messing about with MHonArc and PHP in my spare time and have
JCL> almost finished getting a setup that:
JCL> -- Allows archived messages to be replied to on the web via
JCL> the archive page (replies post to the list).
JCL> -- Templates (PHPLIB) the entire archive appearance. All
JCL> MHonArc does is the parsing and data extraction.
JCL> -- Supports archive searching by MessageID. I've an MTA
JCL> hack that inserts a MessageID-based URL into all outgoing
JCL> Mailman list traffic so the user can just hit the URL and be
JCL> taken to that message in the archives (searches the MHonArc
JCL> DB, useful for thread reference etc).
JCL> Hopefully I'll get something worth public viewing sometime
JCL> next week.
Please do, these sound very cool. One of the things high on my list is to templatize the UI for the web interface so it can be integrated into existing sites more seamlessly. I know some guys who are doing a cool project in Python that might provide the necessary functionality, but on the other hand PHP might be fun to look into to.
-Barry

On Fri, Jun 02, 2000 at 05:17:05PM -0400, Barry A. Warsaw wrote:
[ JC Lawrence about archive/database/php ]
JCL> Hopefully I'll get something worth public viewing sometime JCL> next week.
Not that I disagree (oh, no! It sounds cool! :) but wasn't there something about Mailman had to be coded in Python ? Or is a PHP frontend OK ? Or only if it's optional, or not included in the distribution ?
I am still looking at HyperMail/pipermail, but if these things are in the running, I might just do some cleanup and fix some of the performance problems. (Like Hypermail choking on attachements and stuff.) So it's still useable for the bare-bones kind of server ;) If not, well, I'll take some more time and not worry too much about features such as searchable indexes and such.
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

There is a problem that its hard to come up with metrics for best large recipient list delivery that work across a range of MTAs.
My own feeling (experience based feeling, but I haven't specifically sat down and benchmarked/analysed its behaviour) for exim is that I would tend to have a maximum number of recipients for a single SMTP transaction of ~500 recipients. I would tend to keep simultaneous injects down to around 4 at a time... although this has even less basis that the 500 recipients limit. Exim will take advantage of multiple recipients on the same MX, but you can basically assume it will not take advantage of multiple queued messages going to the same MX (except under particular circumstances). Hence I would in general do the sort/clump by reversed domain name thing since that should win in many cases.
bwarsaw@python.org said:
Thats a very parochial view of the world (he says tongue firmly in cheek). Us UK based people would probably find a different balance would work better for us :-)
Nigel.
-- [ - Opinions expressed are personal and may not be shared by VData - ] [ Nigel Metheringham Nigel.Metheringham@VData.co.uk ] [ Phone: +44 1423 850000 Fax +44 1423 858866 ]

At 9:52 PM -0700 5/24/2000, J C Lawrence wrote:
Nope, just getting a bit ahead and thinking of something that's fun and technically challenging. Been there, done that... (grin).
And that's an issue I've been wrestling with a lot -- do I do a specialized MTA? Or do I let the MTA do its job. After going back and forth on this for weeks, given my current delivery rates, I've decided to let the MTA do its job, and wait on writing a specialized MTA until I need that last couple of percent of performance. Moving to 8.10.1 seems like an easy performance bump, postfix looks like it'll buy me even more, and so while doing all the MXing and stuff would be fun, it can wait.
And I don't think that's a bad metric.
queue management is another issue. that's one place majordomo is weak at, because it doesn't. Everything is delivered as it comes in, so bursts can take a system to its knees.
Another thing to worry about... On my big system, I only do a few mailings a week, but they bunch together. So I've had to do a bunch of work on making sure the system deals with this rationally... when we were doing one mailing on a given day, that was easy, but we're doing both a text and an HTML variant going out together, and that really complicates life.
Well, this is probably preaching to the choir, but I've gotten quite convinced that you isolate every piece you can from every other piece, and document the interfaces. that makes it quite easy to swap out a new piece without affecting the rest of the system -- one of the huge complaints (valid!) on sendmail is it's overly monolithic, and therefore way too complex for its own good. The system I've been building the last few months is finally at the point where I can swap in a new subscription system without worrying about the other parts (did that!), or completely re-arrange the delivery back end without affecting other pieces. And it makes it easier to borrow code and use it, too... (did that, too!)
Just been poking around there and noticed that your archives seem to be inop (dead disk).
I'm about 2/3 of the way through completely replacing the system, so the archives are on the new machine, but not released yet. Lots of chaos, but making progress. I hope to release an open beta of mailman by monday, if I can finish up some stuff (I'm replacing my list directory iwth a yahoo-like tool, and need to get that running, and write the bridging material to get it started)
-- Allows archived messages to be replied to on the web via the archive page (replies post to the list).
Nice! does it restrict posting access to registered users or is it open?
-- Templates (PHPLIB) the entire archive appearance. All MHonArc does is the parsing and data extraction.
good stuff. That's what sympa does, too. It's a nice setup. MhonArc is quite a nice archiver. I used to use it, and then switched my web archives to a full forum system (web crossing) and crosslinked everything. that has its advantages and disadvantages.
Interesting hack. Very interesting hack. you could do something really nice with PHP and MySQL, too, and do away with MHonarc, and parse/templatize the text on the fly. that's sort of where I'm headed down the road....
-- Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com) Apple Mail List Gnome (mailto:chuq@apple.com)
And they sit at the bar and put bread in my jar and say 'Man, what are you doing here?'"

"JCL" == J C Lawrence <claw@kanga.nu> writes:
JCL> While I really have no say here, were I Barry and Co I'd be
JCL> comfortable with targetting Mailman as able to handle a
JCL> mid/high 6digit subscriber base list on mid-range PC-class
JCL> hardware given suitable system configuration. That wouldn't
JCL> be the target of course, just the "it must be physically able
JCL> to work here" metric.
Bingo.
JCL> Were delivery to the MTA seperated from the receipt or CGI
JCL> process (ie mail is received, the RCPT list attached to it,
JCL> and the tuple placed on a queue for background processing via
JCL> forked process or cron job), we wouldn't be having this
JCL> discussion. Its a fairly invasive change to the current
JCL> Mailman architecture, but making the whole reciept/broadcast
JCL> aspect asynchronous offers some really pleasant future
JCL> avenues.
Yes.
JCL> Just been poking around there and noticed that your archives
JCL> seem to be inop (dead disk). If you're interested I've been
JCL> messing about with MHonArc and PHP in my spare time and have
JCL> almost finished getting a setup that:
JCL> -- Allows archived messages to be replied to on the web via
JCL> the archive page (replies post to the list).
JCL> -- Templates (PHPLIB) the entire archive appearance. All
JCL> MHonArc does is the parsing and data extraction.
JCL> -- Supports archive searching by MessageID. I've an MTA
JCL> hack that inserts a MessageID-based URL into all outgoing
JCL> Mailman list traffic so the user can just hit the URL and be
JCL> taken to that message in the archives (searches the MHonArc
JCL> DB, useful for thread reference etc).
JCL> Hopefully I'll get something worth public viewing sometime
JCL> next week.
Please do, these sound very cool. One of the things high on my list is to templatize the UI for the web interface so it can be integrated into existing sites more seamlessly. I know some guys who are doing a cool project in Python that might provide the necessary functionality, but on the other hand PHP might be fun to look into to.
-Barry

On Fri, Jun 02, 2000 at 05:17:05PM -0400, Barry A. Warsaw wrote:
[ JC Lawrence about archive/database/php ]
JCL> Hopefully I'll get something worth public viewing sometime JCL> next week.
Not that I disagree (oh, no! It sounds cool! :) but wasn't there something about Mailman had to be coded in Python ? Or is a PHP frontend OK ? Or only if it's optional, or not included in the distribution ?
I am still looking at HyperMail/pipermail, but if these things are in the running, I might just do some cleanup and fix some of the performance problems. (Like Hypermail choking on attachements and stuff.) So it's still useable for the bare-bones kind of server ;) If not, well, I'll take some more time and not worry too much about features such as searchable indexes and such.
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

There is a problem that its hard to come up with metrics for best large recipient list delivery that work across a range of MTAs.
My own feeling (experience based feeling, but I haven't specifically sat down and benchmarked/analysed its behaviour) for exim is that I would tend to have a maximum number of recipients for a single SMTP transaction of ~500 recipients. I would tend to keep simultaneous injects down to around 4 at a time... although this has even less basis that the 500 recipients limit. Exim will take advantage of multiple recipients on the same MX, but you can basically assume it will not take advantage of multiple queued messages going to the same MX (except under particular circumstances). Hence I would in general do the sort/clump by reversed domain name thing since that should win in many cases.
bwarsaw@python.org said:
Thats a very parochial view of the world (he says tongue firmly in cheek). Us UK based people would probably find a different balance would work better for us :-)
Nigel.
-- [ - Opinions expressed are personal and may not be shared by VData - ] [ Nigel Metheringham Nigel.Metheringham@VData.co.uk ] [ Phone: +44 1423 850000 Fax +44 1423 858866 ]
participants (5)
-
bwarsaw@python.org
-
Chuq Von Rospach
-
J C Lawrence
-
Nigel Metheringham
-
Thomas Wouters