Re: [Mailman-Developers] Re: To VERP or not to VERP?
On Sun, 17 Jun 2001 08:41:56 -0700 Chuq Von Rospach <chuqui@plaidworks.com> wrote:
the amount of disk it takes isn't an issue (within reason) -- remember, it's going to start sending right away, so message will be gone out of the queue.
This is not a safe assumption. Exim for instance will defer deliveries if more than N messages are received in a single connection. As a result, you typically get no outbound deliveries going on during a qrunner broadcast.
You don't queue it all up and then start delivery.
Uhh.
But this generates disk IO, and that can start bogging you down if you don't tune things properly. That means taking advantage of sub-folders in your mail queue for any MTA that allows them (the #1 performance death for a typical sendmail system: write-locks on /var/spool/mqueue, since every sendmail process has to create files in that directory. If you haven't set up subfolders, you are (I say this in a nice way) an idiot. if you aren't using a version of sendmail that allows for them, I'll call you an idiot in a not-nice way. Anyone not running at least 8.10 is hosed, so forget them...
Great READ/FAQ topic.
Most MTAs are sensitive to this and work to minimize the impact. Even sendmail finally figured it out. But you still need, in large e-mail environments, to look at splitting this across heads and spindles. My experiments have indicated you're better off having mail on separate spindles than you are building a RAID using those same spindles, for whatever that's worth.
I've done similar experiments with news spools. The values are incredibly subjective to RAID stripe size. Bob Mende IIRC did some interesting work here which I think he presented at LISA.
FWIW We ended up with a 1Meg strip size as the sweet spot, much bigger than we'd expected.
And if you have lots of RAM, you can start using ram disks, and then you have lots of fun... (yes, I've done that. It's amazing how much faster sendmail is when you remove the disk I/O on those directory inodes...)
Silicon disks are another one. At Critical Path we used solid state disks for /var/spool and Qmail fairly, umm, whirred.
-- J C Lawrence claw@kanga.nu ---------(*) http://www.kanga.nu/~claw/ The pressure to survive and rhetoric may make strange bedfellows
On Sunday, June 17, 2001, at 09:39 AM, J C Lawrence wrote:
Exim for instance will defer deliveries if more than N messages are received in a single connection. As a result, you typically get no outbound deliveries going on during a qrunner broadcast.
IMHO, I consider that seriously broken.
I've done similar experiments with news spools. The values are incredibly subjective to RAID stripe size. Bob Mende IIRC did some interesting work here which I think he presented at LISA.
FWIW We ended up with a 1Meg strip size as the sweet spot, much bigger than we'd expected.
I know someone who did the same, and found the best perforamnce iwth (I kid you not) 32 meg strip sizes.
Silicon disks are another one. At Critical Path we used solid state disks for /var/spool and Qmail fairly, umm, whirred.
But what I've found, for really large e-mail installations, there's always another bottleneck. The bigger/faster machine paradigm just doesn't scale after a while, so what I'm working on now is a new setup that I'm calling the "army of smurfs" design. I'm going to be buying lots of small/fast/cheap boxes, and not going to try to try to keep making that single monolithic machine do incrementally more.
-- Chuq Von Rospach, Internet Gnome <http://www.chuqui.com> [<chuqui@plaidworks.com> = <me@chuqui.com> = <chuq@apple.com>] Yes, yes, I've finally finished my home page. Lucky you.
Funny, I don't remember being absent minded.
On Sun, Jun 17, 2001 at 09:53:28AM -0700, Chuq Von Rospach wrote:
But what I've found, for really large e-mail installations, there's always another bottleneck. The bigger/faster machine paradigm just doesn't scale after a while, so what I'm working on now is a new setup that I'm calling the "army of smurfs" design. I'm going to be buying lots of small/fast/cheap boxes, and not going to try to try to keep making that single monolithic machine do incrementally more.
If you're talking about generic large mail farms, Chuq, you *really* need to go find the Earthlink white paper on that and read it, if you haven't already. They have one on news, too. Don't recall the URL; Ask The Web<tm>.
Cheers, -- jra
Jay R. Ashworth jra@baylink.com Member of the Technical Staff Baylink The Suncoast Freenet The Things I Think Tampa Bay, Florida http://baylink.pitas.com +1 727 804 5015
OS X: Because making Unix user-friendly was easier than debugging Windows
Chuq Von Rospach <chuqui@plaidworks.com> wrote (in two different messages):
Currently I send it about 10000 messages that it breaks up one by one using. I don't know about memory or disk issues there, but 130 000 * 10k = 1.3 Gbytes on disk:
It's a problem. network and disk are (IMHO) the two big performance issues in delivering e-mail (at least the two under your control. The third is the speed at which receiving machines will accept messages, but you can't buy everyone in the universe faster e-mail servers...)
the amount of disk it takes isn't an issue (within reason) -- remember, it's going to start sending right away, so message will be gone out of the queue. You don't queue it all up and then start delivery.
I think it is certainly an issue - the disk partition with the mailqueue must be big enough that even if there are network problems at the time when all your digests are mailed out, all the digests can be stored without problem.
This is not a problem with the qmail trick which I've explained at http://mailman.cis.to/qmail-verh/ because there only one copy of the message is stored in the mailqueue, which is then customized by means of pattern substitutions on-the-fly when qmail talks SMTP with the receiving mailservers.
But if Mailman customizes messages and passes one message per recipient to the MTA, then you had better make sure that you have a big enough mailqueue partition.
that I'm calling the "army of smurfs" design. I'm going to be buying lots of small/fast/cheap boxes, and not going to try to try to keep making that single monolithic machine do incrementally more.
Ezmlm-idx has some really neat features for that, which I'd love to see implemented in Mailman.
Greetings, Norbert.
-- Norbert Bollow, Weidlistr.18, CH-8624 Gruet (near Zurich, Switzerland) Tel +41 1 972 20 59 Fax +41 1 972 20 69 nb@freedevelopers.net
@ Chuq Von Rospach (chuqui@plaidworks.com) :
But what I've found, for really large e-mail installations, there's always another bottleneck. The bigger/faster machine paradigm just doesn't scale after a while, so what I'm working on now is a new setup that I'm calling the "army of smurfs" design. I'm going to be buying lots of small/fast/cheap boxes, and not going to try to try to keep making that single monolithic machine do incrementally more.
On my setup (postfix) there's a heavy use of the /etc/postfix/transport feature, that redirects all domains with which I have a problem (like those that are not always accepting mail I send) to a slave machine that is very near (in terms of bandwidth) the first one. It reduces A LOT the disk problems (mostly mailqueue becoming very heavy) -- maybe I should continue this way with more domains being handled to slave machine(s). Is that the option you're describing?
-- Fil
On Sunday, June 17, 2001, at 11:58 AM, Fil wrote:
On my setup (postfix) there's a heavy use of the /etc/postfix/transport feature, that redirects all domains with which I have a problem
what's known as the "bog" (a term I ripped off from Infobeat's architecture). Try to deliver it once, exile it to the bog for non-real time delivery.
Infobeat's actually kinda nice, if you want ot get really hard-core about it. They deliver right out of the database into the client SMTP port, effectively sucking up the outgoing MTA into the application system. I've looked hard at doing that as well, but you start getting into the grimy details of dealing with all the MX weirdness and the like, and I've decided not to right an MTA right now, but instead to let the MTA writers figure out how to best do that. )
problems (mostly mailqueue becoming very heavy) -- maybe I should continue this way with more domains being handled to slave machine(s). Is that the option you're describing?
Not really, but it's a good, legitimate option. When I was under sendmail 9, I implemented special slow queues as bogs and moved things there after 3 delivery tries. Under sendmail 8.11, I don't, because the sub-directory stuff allowed me to do away with having to deal with that from a performance standpoint.
What I'm designing is a system that will cause a daemon on a central server to connect to a bunch of machines in a farm and use them to simultaneously generate messages out into multiple SMTP queues -- you effectively hand a remote machine (an smtp smurf) a message template, then feed it data as fast as it can send. You get a very compact protocol between the two (since all you send over is the data to fill the template), and each one is noting but a mail-creation process feeding an MTA on a dedicated machine. And the rest, as they say, is simply tuning for maximum perforamnce without thrashing.
This isn't a discussion-list system, but the delivery setup could be adapted to that fairly easily. If you were going to do something similar on Mailman, you could do it by having a farm of SMTP outgoing machines under a round-robin with a short time-out, and making sure qrunner re-looks up the IP after every message. Do that, and then extend to allow for parallel qrunners, and you can buidl a heck of a mail list farm. 2.1 will do most of that fairly easily, the rest is figuring out and configuring the SMTP smurf farm and its round-robin.
In my case, I need to be able to take 40K length messages, and be able to build a system that'll ship out ~1,000,000 addresses an hour, with full customization (i.e. no piggybacking of adresses). Which might explain why I did 3+ hours of math last night on the bandwidth question; I needed it anyway, I'd been meaning to get to it, and as long as we were talking about it, it was a great excuse to build a model for my 'real' work. My current system only does ~350-400,000 an hour with limited piggybacking of addresses, and doesn't do the full customization. But it also depends on a monolithic machine architecture, which simply can't scale infinitely. Right now, my big bottleneck on that machine is that my 100BaseT is full. I'm going to be bringing up a quad ethernet soon, but from there, unless you start talking about fiber and or trunking solutions, you're done. So rather than trying to eek out another 2% at ever greater engineering cost, we're moving to a smurf model, because you can buy a bunch of small, fast machines for the cost of a big honker with gigabit ethernet and a trunking system.
The downside is you add administrative complexity, plus you need to engineer getting the data where it needs to be (and the security of making sure nobody else uses your smurfs...). But those seem to be manageable. My current design includes three flavors of smurfs (smtp-out, smtp-in, and web), and I'm actually working on whether it makes sense take over a delegated sub-domain and run my own DNS, so I can dynamically move the smufts into whatever function I need -- it'd make a LOT of sense, from machine-usage terms, to be able to dedicate all but one or two smurfs to SMTP-out during the early delivery, and then start shunting them off one at a time into SMTP-in role or web-role to handle returned bounces, postmaster mail and user unsubscription requests. And once the bulge is done, take them offline and turn them into bounce-processing smurfs.... I could do more with fewer machines, but my office would end up looking like the Johnson Space center at launch time... (grin).
-- Chuq Von Rospach, Internet Gnome <http://www.chuqui.com> [<chuqui@plaidworks.com> = <me@chuqui.com> = <chuq@apple.com>] Yes, yes, I've finally finished my home page. Lucky you.
USENET is a lot better after two or three eggnogs. We shouldn't allow anyone on the net without a bottle of brandy. (chuq von rospach, 1992)
participants (5)
-
Chuq Von Rospach
-
Fil
-
J C Lawrence
-
Jay R. Ashworth
-
Norbert Bollow