[Mailman-Developers] Problem with qrunner and too much incoming mail

Mon, 6 Nov 2000 23:00:32 -0800

At 11:59 PM -0500 11/6/00, Barry A. Warsaw wrote:
>experience with Wikis, which everybody at my new employer simply raves
>about.  Please visit
>
>     http://www.zope.org/Members/bwarsaw/MailmanDesignNotes

got that bookmarked. Looks interesting.

>mistake was in making the delivery module part of that pipeline.  What
>Mailman's pipeline ought to do is the prep-work on the message only:

That's pretty much what I do now on my big mother machine. There's a 
web page for posting, and it spawns a script that creates a message 
file (with full headers already finished), then sucks the subscriber 
list out of an SQL database, generates a set of commandfiles and 
subscriber lists, and throws them in a queueing system. it's 
configurable on the fly (FWIW, I use QPS for queueing, I didn't write 
one. Gnu Queue was my first choice, but it has problems on solaris I 
didn't have time to debug...)

>spam and privacy filtering, setting headers, updating per-list
>counters, appending to digests, etc.  Anything that does not require
>writing list-specific data could be pulled out of the pipeline.  I'm
>thinking about specifically about nntp posting and the mta-handoff.

I'm beginning to think that "mailman 3.0" may end up being NOTHING 
but APIs and enough glue to interface them. that way, *any* piece can 
be swapped out for something equivalent if we want to, and we can 
strongly isolate interactions and keep each code-base simple. A 
subscription API, a spam API, a digest API, etc, etc, etc. It's not 
at all that simple in practive, but in theory, you ultimately have a 
set of Python classes that define a MLM...

>  E.g. our whizzymailer would
>know the details of Mailman so when it got errors during the smtp
>transaction, it could update the db's directly.  This isn't as likely
>to happen when we handoff to a localhost MTA, unless they support DSN
>and we run them synchronously (which clobbers the current
>architecture, as we're seeing).

this is nice in theory, but again, you start BEING an MTA, and the 
subtleties are going to whack you up side the head. I looked at this 
a while back, and decided that was a place I didn't want to go. I 
really think we need to be careful about optimization by subverting 
the MTA, because down that path lies sendmail -- which does 
everything known to mankind, but nobody can figure out the 
documentation...

Let the MTA be an MTA, and simply hand stuff off and process the 
returns. That gives you the ability to build tools that leverage 
other people's strength's, whether it's Postfix or smartbounce. 
Otherwise, you're asking for a LOT of work that you don't think 
you're going to need to do yet. I didn't realize that until I *did* 
start designing sendmail out of my system and saw the results - 
slower, uglier and I got to maintain the code myself...

>  Would there be any disk persistence in
>case of system failure?

There has to be, but one thing sendmail did with 8.10 to boost 
performance was to figure out what needed to be on disk for recovery, 
and keeping the rest in memory. For their purposes, that was the df 
and qf files, and they do all of the locking and status files via 
what's effectively an in-sendmail RAM-based pseudo disk for all of 
the other files, like lock files. so stuff relevant to the current 
delivery attempt is in RAM, the stuff you need ot decide whether ot 
deliver or how to deliver is on disk. In a crash, you only lose data 
that's not relevant after a crash.

the big problem I see on sendmail 8.9 is inode locking, especially 
when it's updating the /var/spool/mqueue directory inode. sendmail 
8.10 goes a long way towards fixing that with the /var/spool/mqueue/* 
setup -- you can imagine the fun of 400 sendmails all trying to 
update their queue files in the same directory inode. (wince).

So before we get into fancy hashing systems, let's see how we do with 
the basics -- split in/out/bounce into separate qfiles, split 
content/metadata/status/lock into separate subdirectories, and if 
necessary, allow multiple directories to further split the directory 
contention.

In fact, a really simply way to parallelize Mailman would be to allow 
multiple qfiles, and every time qrunner is spawned from cron, it 
creates one instance per directory. that way, you distribute the load 
out evenly and can rearrange it as you need by adding or removing 
directories. No funky config file issues.

>Then again, since most messages don't live for very long in the queue,
>maybe the elimination of the disk i/o is worth a little instability or
>larger memory footprint.

Before you make that decision, we need to know whether the I/O is 
actually significant, and which pieces of the I/O can safely be held 
off. But in reality, I'll bet you won't find a Mailman site where the 
mailman directories are I/O bound in an significant way. We shouldn't 
try to optimize things that aren't the bottlenecks....

>Forking is pretty heavyweight, and threading has its problems too.

But for what we're talking about, the fork overhead is pretty 
trivial. Forking is bad for lots of little, short-lived things. 
Forking is good for relatively few, long-lived things. Given what the 
processing cost of delivering 500 pieces of email will be, the 
overhead of the fork is non-existant. If we were forking a process 
per address, I'd worry about it. Forking a process per message isn't. 
Reality is somewhere in the middle -- but the trick is to find the 
slow parts and speed them up, rather than just try to speed up 
various things we guess might be slow.

>     CVR> Splitting the inbound and outbound queue would be my first
>     CVR> thing here, and probably split bounces into a third
>     CVR> queue.
>
>Great idea.  Each queue has it's own requirements, e.g. there's
>definitely been complaints about the minimum 1-minute delay outgoing
>messages.

the outbound queue is a perfect place for a daemon to sit, and make 
sure there are always up to "N" messages being processed (we might 
want to amend that so only one message is in process at a time for a 
given list. Hmm, does it make sense to split the outqueue into 
subdirs (see above) by list name? the outqueue daemon could then 
round-robin the lists, to prevent a busy list from stuffing the other 
lists into a corner...

>Agreed.  I also want that feedback for list-bound messages so that
>Mailman can be notified directly from the MTA about certain types of
>delivery failures.

I wouldn't worry about this. the programming complexity makes this a 
false economy. It sounds nice in theory, but I wouldn't make it a 
design goal until we get other stuff in place -- if then. Bounces are 
a pain in the neck, but not that nasty to deal with, and the places 
where simply background processing bounces falls down, this isn't 
really likely to help, because it's the guy three forwards away from 
the subscribed address, behind a firewall, on a Notes server, who 
changed his name when he got married four months ago...

What you're really proposing, Barry, is to have to implement TWO 
bounce processing systems. One for stuff where the delivery attempt 
fails locally, and a second one for stuff where the mail is delivered 
to an agent that then sends a reject back. and that latter includes 
all of the major ISPs (especially AOL and MSN), most major 
corporations (including Apple), and basically every large site with 
firewalls and mail relays through them. So you're doubling your work 
writing bounce processing code, and it buys you very, very little. 
And the real trouble cases won't be helped at all, because they'r the 
ones that won't get nailed until they come back through those 4 
relays with the addresses munched and headers stripped.

>  I still worry about bottlenecks in synchronous
>mode, even with a high degree of parallelism and shallow buckets.

 From the point of view of Mailman, I doubt it's really an issue. I 
think you're still thinking MTA here. Mailman is NOT an MTA. You 
don't want to write an MTA. you don't want to think like an MTA 
writer. (see this swinging watch? you are getting sleepy, sleepy... 
you do not want to write sendmail. you do not want to integrate 
sendmail into Mailman. you are not eric allman. sendmail.cf files 
give you hives...)

>Thinking out loud: what if the API had two channels, mlm->mta and
>mta->mlm, let's call them outbound and inbound respectively.

I'd do three APIs, actually. DeliverMail, IncomingMail, BounceMail. 
they might hand off to each other (forinstance IncomingMail would 
recognize a bounce, and forward it ot BounceMail), but they're really 
independent operations. I worry that trying to build a single API 
would start throwing in compromises or overloading concepts. And iwth 
three APIs, they can be developed independently by different people 
-- and swapped out independntly. With a single API, that's tougher.

>     CVR> If someone wants a rhetoric on how to scale mail list servers
>     CVR> infinitely, I'd be happy to explain how, since I've had to
>     CVR> develop an architecture to do so.
>
>If you write it up, I'll add it to the documentation.  At the very
>least, let's add it to the ZWiki.

Will do, once I get a chance. I've put it on my todo list.

>Have you played at all with the threaded delivery in SMTPDirect?
>Admittedly it's not integrated correctly with the rest of Mailman, but
>I'm still curious if the notion is salvagable.

A little. I really think the fork model needs to be used, because the 
thread locks don't seem to allow enough processing independence. The 
delivery stuff is going to spend most of its time in I/O wait with 
kernel locks, and needs to respond quickly when the I/O is available. 
I'm afraid that using threads for this defeats that purpose, because 
the block has to be reported to Python, which then has to get around 
to activating that thread, and by the time you do, you've lost the 
performance edge, especially when you're talking about a number of 
these fighting for the single CPU resource.

This is a case where the fork overhead is trivial compared to the job 
overhead, and you really want these independent and down in the 
kernel, since you're dealing with stuff the kernel is best suited to 
resolve.

-- 
Chuq Von Rospach - Plaidworks Consulting (mailto:chuqui@plaidworks.com)
Apple Mail List Gnome (mailto:chuq@apple.com)

Be just, and fear not.