Re: [Mailman-Developers] Huge lists

On Wed, 24 May 2000 17:08:18 -0700 Chuq Von Rospach <chuqui@plaidworks.com> wrote:
At 3:41 PM -0700 5/24/2000, J C Lawrence wrote:
True. My curiosity however is what MTA's do MX sorting, and more particularly, MX collapsing (eg for two different targets that share an MX's among their lowest level). The potential gains there are likely not huge, but could be (guesstimate) noticable for high volume servers with broad standard deviations in their target lists.
I'll have to check into that some time.
True, this would be a useful optimisation for most of the MTA architectures I know of. Its also quite cheap and easy to do which makes it even more tempting.
I guess that we need a per MTA tuning/configuration document.
Aaaargh. Yes.
There gets to be a point however where it really exceeds Mailman's charter. Mailman is a list server, not a training course on how to build and configure a high volume mail system. While I don't think we've crossed or even approached that line, In general I'd rather spend time on Mailman than high end server considerations which are adequately (?) documented elsewhere.
I've tested it here under Exim (as of about 2 years ago). The gains were quite noticable for leaving it to the MTA for connection-time resolution. Mostly, I suspect, because Exim didn't cache (or pre-stuff) the DNS results from the validity check for MX delivery. Actually, I don't think Exim maintains a significant DNS cache across delivery attempts in the first place, assuming, quite rightly in the general case, that the local nameserver can be trusted to do that cacheing for it. I haven't checked this tho, as my need (I had a 140K member list) disappeared (the company sponsoring the list collapsed).
How about Postfix? Anybody know?
Postfix is "on the list" for later this summer for me...
I followed Postfix actively in its early days, up till about a year after first public release when I got distracted elsewhere (I used to publicly archive all the Postfix lists here at Kanga.Nu). I figure I'll probably roll everything over to Postfix sometime in a couple months, tho I'll miss Exim's nice log analysis and queue tools.
Right now, I generally recommend sites doing a lot of mail-list traffic...
I generally recommend heartily against Sendmail for such sites. I just don't see it as worth the extra effort (or obscurity) when newer MTAs such as Exim (wot I use currently), QMail or Postfix in general offer the same or better performance and configurability with the added benefit of human readable/auditable config files.
While its a cheap logic, its easy to note that none of the very high volume commercial email sites out there are based on Sendmail (Critical Path, Hotmail, Onelist, EGroups, etc).
Of course not. Everybody knows that Microsoft Exchange is the one true MTA and all else are but pale imitations.
don't even JOKE about that.
You don't know how many times I've nearly uncommented the Exim rule that would auto-bounce (during SMTP receipt) any message with an Exchange entry in the Received headers. It has been tempting.
The only mail software out there that draws more ire from me is Outlook. Pathetic. Absolutely pathetic. Of course I also have a still-commented-out procmail rule in place before Mailman that would auto-bounce messages from Outlook, and the only reason I haven't uncommented it is that I have too many valued list members who cannot use anything else (corporate standards).
<sigh>
As someone who deals with email for a living...
I should probably note at this point that I'm working for Critical Path on their mail systems.
...the only system that comes *close* to Exchange in the braindead category is Lotus notes.
Sorry, entirely different orders of magnitude there. Notes is bad, certainly, and there few things even close to being as bad as Notes or CC Mail (tho they've gotten a lot better in recent years (which isn't saying much)), but Exchange/Outlook make them look positively angelic in comparison.
I got some nice filters...
You might as well drive your computers with a squirrel on a wheel.
Nope. That's Notes. Exchange? Remember the dead parrot skit...?
-- J C Lawrence Home: claw@kanga.nu ----------(*) Other: coder@kanga.nu --=| A man is as sane as he is dangerous to his environment |=--

At 6:38 PM -0700 5/24/2000, J C Lawrence wrote:
but -- as the experts say, the first $500 buys you 90% of the stereo response, and the rest of the money goes into getting you as close to 100% as you can get. MX sorting is definitely far up into that 90% range, computationally and time expensive, and lots of other stuff can be done first, with more gain, and less effort. For most lists, the differential in performance between domain sorting and MX sorting is probably not statistically meaningful.
Maybe one thing we need is a definition of what Mailman is and what it isn't. Some kind of target for the size of lists it wants to reasonably support. If it's 5,000 users, it doesn't matter what you do. If it's 50,000, or 500,000, you definitely have different requirements.
So defining what mailman wants to solve can help us clear these things up. "Every list in the universe" is a laudable goal, but it'll probably delay shipping 2.0 for a decade or so... So I'd like to suggest some performance goals be defined, and then program to those, so we're all on the same page.
(being able to handle a moderately busy 25,000 user list, say 15-30 messages a day, would probably cover 95% of the mailing lists in the universe, and still technologically well within reach... It'd be nice to be able to say "5 million subscribers in 2 minutes!", but focus on a solid "do most things for most folks" now, and add the high performance/huge list support in 2.5. But leave the hooks in, so we don't have to rewrite later....)
True, but a one page README.<mlm> page in the disto for each reasonably supported MLM isn't a bad thing, and better than what anyone has. Because one reality is that most MLMs are configured (especially out of the box) to manage incoming mail, and efficient handling of outgoing mail is very different. Some hints on dealing with those optimizations and tradeoffs can't hurt, and wouldn't have to be significant or huge efforts.
I tend to agree -- but performance of mailman is inextricably tied to performance and interface with the MTA. If you ignore the MTA, your chances of making mailman work well are very small. and users will tend to blame mailman, because "sendmail worked fine before we installed mailman, so...."
Valid points. But sendmail is a default-install in many installations, and so it's going to be what's avaialble. So helping people figure out how to best make use of it is important, sort of like refusing to let AOL users on a list. Yes, some AOL users can be problems, but AOL users also tend to be a huge part of an audience (on my machines, 15% isn't uncommon).
Postfix looks like a *real* win, but until I run it through its paces, I won't use it. But the people I know who do love it. And I've got other fish to fry before moving to postfix (and right now, I'm doing 400-500,000 an hour out of my mail system without trying too hard, using sendmail 8.9.3, and peaks approaching 900K. So eeking out more performance by swapping MTAs is not a priority)
As long as we're into disclosure, I run a bunch of hobby lists at plaidworks.com, but I also do most of the mail list stuff at Apple, where there's a combination of off the shelf (or actually, heavily hacked) majordomo and custom jobs, so my lists range from really tiny (10-12) to very, very large. The large system is custom coded, with the exception of the last remnant, which is bulk_mailer. I've completely replaced everything else, and bulk_mailer's replacement is going into test as soon as I finish it (and it'll fully VERP; although I had a bit of a scare last week when I was doing some throughput estimates and got some zeros wrong, and thought for a while that my total delivery was going to range into the terabytes. I was wrong, thank ghu, and it's merely in the range of 40-60 gigabytes per mailing....)
Notes is obnoxious, especially since return-receipt is an administrator controlled option, and not smart enough to NOT r-r mailing lists (or anything else), and I've found Notes administrators about as obnoxious as their software when you point things like that you. The only word I can use for Exchange is brutal. There are exchange sites out there who's idea of a bounce message is to return the mail to the "to:" line with only the Message-ID changed. you can imagine how much fun THAT is.
Those sites (fortunately rare, all broken, but at least two of them have been broken that way for four bloody years) my site simply blackholes.
-- Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com) Apple Mail List Gnome (mailto:chuq@apple.com)
And they sit at the bar and put bread in my jar and say 'Man, what are you doing here?'"

"CVR" == Chuq Von Rospach <chuqui@plaidworks.com> writes:
CVR> True, but a one page README.<mlm> page in the disto for each
CVR> reasonably supported MLM isn't a bad thing, and better than
CVR> what anyone has.
Did you typo here or am I misunderstanding? Mailman has README's for Qmail and Sendmail. They could probably be elaborated on, and other MTAs could be added, but is this what you were looking for?
-Barry

Yes, that's what I'm suggesting -- adding to or elaborating on these READMEs to discuss how to set them up and optimize them.
-- Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com) Apple Mail List Gnome (mailto:chuq@apple.com)
And they sit at the bar and put bread in my jar and say 'Man, what are you doing here?'"

"CVR" == Chuq Von Rospach <chuqui@plaidworks.com> writes:
>> Did you typo here or am I misunderstanding? Mailman has
>> README's for Qmail and Sendmail. They could probably be
>> elaborated on, and other MTAs could be added, but is this what
>> you were looking for?
CVR> Yes, that's what I'm suggesting -- adding to or elaborating
CVR> on these READMEs to discuss how to set them up and optimize
CVR> them.
Great idea, and I eagerly await contributions! :)
-Barry

At 6:38 PM -0700 5/24/2000, J C Lawrence wrote:
but -- as the experts say, the first $500 buys you 90% of the stereo response, and the rest of the money goes into getting you as close to 100% as you can get. MX sorting is definitely far up into that 90% range, computationally and time expensive, and lots of other stuff can be done first, with more gain, and less effort. For most lists, the differential in performance between domain sorting and MX sorting is probably not statistically meaningful.
Maybe one thing we need is a definition of what Mailman is and what it isn't. Some kind of target for the size of lists it wants to reasonably support. If it's 5,000 users, it doesn't matter what you do. If it's 50,000, or 500,000, you definitely have different requirements.
So defining what mailman wants to solve can help us clear these things up. "Every list in the universe" is a laudable goal, but it'll probably delay shipping 2.0 for a decade or so... So I'd like to suggest some performance goals be defined, and then program to those, so we're all on the same page.
(being able to handle a moderately busy 25,000 user list, say 15-30 messages a day, would probably cover 95% of the mailing lists in the universe, and still technologically well within reach... It'd be nice to be able to say "5 million subscribers in 2 minutes!", but focus on a solid "do most things for most folks" now, and add the high performance/huge list support in 2.5. But leave the hooks in, so we don't have to rewrite later....)
True, but a one page README.<mlm> page in the disto for each reasonably supported MLM isn't a bad thing, and better than what anyone has. Because one reality is that most MLMs are configured (especially out of the box) to manage incoming mail, and efficient handling of outgoing mail is very different. Some hints on dealing with those optimizations and tradeoffs can't hurt, and wouldn't have to be significant or huge efforts.
I tend to agree -- but performance of mailman is inextricably tied to performance and interface with the MTA. If you ignore the MTA, your chances of making mailman work well are very small. and users will tend to blame mailman, because "sendmail worked fine before we installed mailman, so...."
Valid points. But sendmail is a default-install in many installations, and so it's going to be what's avaialble. So helping people figure out how to best make use of it is important, sort of like refusing to let AOL users on a list. Yes, some AOL users can be problems, but AOL users also tend to be a huge part of an audience (on my machines, 15% isn't uncommon).
Postfix looks like a *real* win, but until I run it through its paces, I won't use it. But the people I know who do love it. And I've got other fish to fry before moving to postfix (and right now, I'm doing 400-500,000 an hour out of my mail system without trying too hard, using sendmail 8.9.3, and peaks approaching 900K. So eeking out more performance by swapping MTAs is not a priority)
As long as we're into disclosure, I run a bunch of hobby lists at plaidworks.com, but I also do most of the mail list stuff at Apple, where there's a combination of off the shelf (or actually, heavily hacked) majordomo and custom jobs, so my lists range from really tiny (10-12) to very, very large. The large system is custom coded, with the exception of the last remnant, which is bulk_mailer. I've completely replaced everything else, and bulk_mailer's replacement is going into test as soon as I finish it (and it'll fully VERP; although I had a bit of a scare last week when I was doing some throughput estimates and got some zeros wrong, and thought for a while that my total delivery was going to range into the terabytes. I was wrong, thank ghu, and it's merely in the range of 40-60 gigabytes per mailing....)
Notes is obnoxious, especially since return-receipt is an administrator controlled option, and not smart enough to NOT r-r mailing lists (or anything else), and I've found Notes administrators about as obnoxious as their software when you point things like that you. The only word I can use for Exchange is brutal. There are exchange sites out there who's idea of a bounce message is to return the mail to the "to:" line with only the Message-ID changed. you can imagine how much fun THAT is.
Those sites (fortunately rare, all broken, but at least two of them have been broken that way for four bloody years) my site simply blackholes.
-- Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com) Apple Mail List Gnome (mailto:chuq@apple.com)
And they sit at the bar and put bread in my jar and say 'Man, what are you doing here?'"

"CVR" == Chuq Von Rospach <chuqui@plaidworks.com> writes:
CVR> True, but a one page README.<mlm> page in the disto for each
CVR> reasonably supported MLM isn't a bad thing, and better than
CVR> what anyone has.
Did you typo here or am I misunderstanding? Mailman has README's for Qmail and Sendmail. They could probably be elaborated on, and other MTAs could be added, but is this what you were looking for?
-Barry

Yes, that's what I'm suggesting -- adding to or elaborating on these READMEs to discuss how to set them up and optimize them.
-- Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com) Apple Mail List Gnome (mailto:chuq@apple.com)
And they sit at the bar and put bread in my jar and say 'Man, what are you doing here?'"

"CVR" == Chuq Von Rospach <chuqui@plaidworks.com> writes:
>> Did you typo here or am I misunderstanding? Mailman has
>> README's for Qmail and Sendmail. They could probably be
>> elaborated on, and other MTAs could be added, but is this what
>> you were looking for?
CVR> Yes, that's what I'm suggesting -- adding to or elaborating
CVR> on these READMEs to discuss how to set them up and optimize
CVR> them.
Great idea, and I eagerly await contributions! :)
-Barry
participants (3)
-
bwarsaw@python.org
-
Chuq Von Rospach
-
J C Lawrence