[Personal Ccs deleted... list only this time]
Its nice to see you folks have been enjoying yourselves whilst I sleep. However I now have the advantage, so will respond to the dozen or so messages in the last batch :-) ]
chuqui@plaidworks.com said:
throwing hardware at a problem isn't always possible. but the place where rolling your own internal MTA starts becoming useful is when the list is big enough that the disk I/O involving the MTA starts becoming the significant limiter. With sendmail 8.9.x, that's fairly easy to run into. With sendmail 8.10, it seems to be better, and the multiple queue stuff solves a multitude of problems involving huge directory structures.
Wietse had some figures on MTA performance analysis which he used as part of the design process for Postfix. He concluded that disk I/O was *the* limiting factor for an MTA - remember that to comply with the RFCs you have to commit incoming data to stable storage before acknowledging receipt (ie the positive reply to SMTP end of data) - in all current mainstream MTAs that means that the queue file has to be closed and synced. Pushing data down to the rust and ensuring its there stably limits things drastically. Wietse's tests should be on www.postfix.org
VERP exacerbates the problem, since # of batches sent to the MTA equals the # of addresses, which explodes the number of control files, which... So at some point, it makes sense to deliver direct to recipient rather than build batches into the MTA, and completely avoid the disk I/O and deliver right out of the database to the receiving SMTP client. You could strongly parallelize the delivery setup because you'd do away with all of the MTA overhead, and do all sorts of fun things, like prioritize your delivery sorting and the like.
If we have a million user list... and a message of a few K, I'm not sure I want to have a few GB of queue space taken up. If some idiot sends a 1M attachment I doubt many of us have the TB spool space.
Having said that I *really* would like the possibility of the occaisional message (maybe even just the password reminders.. although I'd prefer a method where some messages if the list was in a state where it has recently seen bounces that it cannot tie to a particular subscriber) be sent out using VERP. However then we also need to recode the MTA incoming handling to take that - aliases don't cut it any more.
The queueing stuff is interesting, although big list focused boxes are likely to not be the primary users of mailman - however if the exim list is anything to go by those (big list) users will be among the most vocal and contribute most ideas and code. [I have worked on big mail systems, but not really big list systems]
claw@kanga.nu said:
Sorting the RCPT TO list by domain costs us very little (esp if we sort on insertion), and can help users of dumb MTAs considerably.
Yup...
chuqui@plaidworks.com said:
You could make a good argument that the best way to optimize is to create one mail batch per unique hostname, up to SMTP-MAX-RCPTS, at which point you split it into num_addrs/SMTP-MAX-RCPTS batches for that hostname, and then let the MTA sort if out from there.
Counter examples are always problems.... The biggest UK ISP group (several "virtual" ISPs use the same bulk ISP service set) has a few million users each of whom have their own domain name - so you will find that *.freeserve.co.uk (around 2 million domains) all goes to the same batch of MXes. This means that a good approach (for this type of account naming) would be to pack in sets of addresses in reverse domain order until you had a batch of SMTP-MAX-RCPTS (obviously you additionally optomise this by also making sure that a single domain is not split over 2 batches unless the number of addresses in that domain are larger than a batch).
As for a quick description of exim queueing practices:-
- Queues are processed in a basically random order... incoming messages however *normally* have a delivery process invoked for them immediately after end-of-smtp-data (there is policy associated here - can be tweaked)
- Each domain/address/message have retry hints associated with it if the retry time for a message/domain/address has not been hit then it is not taken further - so often a group of messages in the queue are skipped on each queue run because their retry time has not arrived
- Exim resolves all undelivered addresses in a message and groups them by MX (lets ignore alternative delivery schemes here)
- Each MX set has delivery attempted (there may be parallelism here)
- If the MX set can be contacted then the message is shoved down the pipe, then the hints database is checked for other messages outstanding on that MX set - if so then the pipe is passed to another delivery process invoked on one of the waiting messages
- If MX set was *not* successful then the hints are updated to say this message has addresses outstanding on that MX
So in the normal case each delivery process delivers only to the addresses in the message its dealing with - each message is independent so you may have several SMTPs to the same place for different messages. If things clog up then hints help make things more efficient. [these are hints - sometimes they are ignored, and trashing the hints db is quite OK]. This all works pretty well in practice. You can if you want a particular type of efficiency rearrange things - ie make all messages resolve, but only deliver on queue runs, which means that messages for the same destination host are nearly always batched down a single SMTP session.
[On per-MTA documentation] Lets start bullying^Wpersuading people to contribute some documentation on this stuff or pointers to existing MTA documentation that addresses this. The question of MTA configuration for medium size lists is pretty common, so there must be tuning data around. I guess I could collate if needed [sigh]
Big lists are a different issue - you need to *choose* your MTA and hardware within your constraints for that. Tuning is probably a consultancy job for those.
chuqui@plaidworks.com said:
There are exchange sites out there who's idea of a bounce message is to return the mail to the "to:" line with only the Message-ID changed. you can imagine how much fun THAT is.
More special bounce filters needed :-) I *like* the way that mailman is now dealing with an impressive proportion of bounces. I need to write an extra filter to make it drop delay warning messages, other than that theres very little stuff getting through to me in the way of bounces.
That particular one you mention should be blocked from the net - presumably their upstream is clueless too.
Nigel.
-- [ - Opinions expressed are personal and may not be shared by VData - ] [ Nigel Metheringham Nigel.Metheringham@VData.co.uk ] [ Phone: +44 1423 850000 Fax +44 1423 858866 ]
At 10:32 AM +0100 5/25/2000, Nigel Metheringham wrote:
Wietse had some figures on MTA performance analysis which he used as part of the design process for Postfix. He concluded that disk I/O was *the* limiting factor for an MTA
That matches my experience. I've had good results using RAM disks to minimize this, which is sort of cheating, but worth it. And much of the structural changes to Sendmail 8.10 are aimed at reducing this impact, so they've finally figured this out, too. m
If we have a million user list... and a message of a few K, I'm not sure I want to have a few GB of queue space taken up.
That's one reason in favor of having the MLM monitor MTA backlogs and throttle itself. On my systems, I try to tune things so that I give the MTA enough to chew on and get up to speed, but not so much that it starts thrashing trying to deal with queue overhead issues. I'm hoping sendmail 8.10 and down the road postfix allow me to no longer worry about this (or worry about it less...). But for large lists, it's another issue.
Having said that I *really* would like the possibility of the occaisional message (maybe even just the password reminders.. although I'd prefer a method where some messages if the list was in a state where it has recently seen bounces that it cannot tie to a particular subscriber) be sent out using VERP.
It depends on the type of list. If you're a busy list, VERPing every message could well be overkill, but again, it gets back to being able to do other things as well, like pre-loading addresses into the unsubscribe URL. there's some nice user-interface improvements you can make once you have VERP to make the whole user experience much less painful...
But if you're an announce list that only comes out occasionally, sending out a monthly "here's your password" update on a list that averages 2 messages a month seems to be the wrong thing, at least to me. Because the hassle factor of the noise generated by the administrative postings starts to overwhelm the content. your users won't like that (this brings up a sub-discussion, that of the "monthly reminder message", but we won't go there now... )
Again, I think we have to remember that there are lots of different USES for mail lists, and different usage forms for those types. How you handle a twice-a-month enewsletter is going to be much different than a 40 message a day discussion list. So there's no single right answer, and configurable options to support the different flavors is a really Good Thing....
However then we also need to recode the MTA incoming handling to take that - aliases don't cut it any more.
I was thinking last night that what would REALLY, REALLY be useful here is an extended SMTP protocol that allows the VERPing to be introduced by the receiving SMTP server, rather than the delivery server or MLM. And after thinking about it, I went and laid down in a dark room until I got over it... (snicker). But if you think about it, the downside to VERP is you lose the efficiency of batching multiple addresses into a single transaction, so the solution is to extend SMTP to allow us to maintain that effeciency while building in the VERPing data at time of delivery...
And if you think that's feasible, you need to lie down in a dark room... But it's an intriguing thought....
Counter examples are always problems.... The biggest UK ISP group (several "virtual" ISPs use the same bulk ISP service set) has a few million users each of whom have their own domain name - so you will find that *.freeserve.co.uk (around 2 million domains) all goes to the same batch of MXes.
Yeah. And until I realized that you need to worry about domain names out to the fourth sub-domain, it was driving my database stuff crazy, because lookups sludged out badly. Here in the states, you get used to worrying about 2nd level domains, and maybe third, but when your audience internationalizes, the rules change... I now track domain name uniqueness out to the fourth part, just to handle places like freeserve cleanly. Otherwise, all heck breaks loose.
[On per-MTA documentation] Lets start bullying^Wpersuading people to contribute some documentation on this stuff or pointers to existing MTA documentation that addresses this.
As I find stuff out, I'll definitely make it available, and should be able to at least help collate.
Big lists are a different issue - you need to *choose* your MTA and hardware within your constraints for that. Tuning is probably a consultancy job for those.
I wouldn't worry too much about big stuff, either. And to be honest, once I catch up on some other stuff, I'll be setting up a mail list and other resources for big list admins.
That particular one you mention should be blocked from the net - presumably their upstream is clueless too.
Yup. They effectively ARE blocked from all of my sites. One is in italy, for instance, although another that I've had problems with is a school site down in Scottsdale. the problem seems to be sites that are running really downrev versions of things that nobody's watching or upgrading.
-- Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com) Apple Mail List Gnome (mailto:chuq@apple.com)
And they sit at the bar and put bread in my jar and say 'Man, what are you doing here?'"
participants (2)
-
Chuq Von Rospach -
Nigel Metheringham