[Mailman-Developers] Mailman queue design problem?

Fri, 22 Jun 2001 11:54:11 -0400

>>>>> "CVR" == Chuq Von Rospach <chuqui@plaidworks.com> writes:

    CVR> Take a really large mailman system, one that's outgrown a
    CVR> single machine. Add a dedicated delivery machine. Now,
    CVR> outgrow that. So add a second. Right now, you'd have to do
    CVR> that with some DNS round-robin magic, or hacking the
    CVR> code. Instead, allow defining 1-N outgoing queues to
    CVR> different MTAs, and have mailman place every outgoing message
    CVR> in one of them either in sequence or random (or make it
    CVR> configurable. Maybe nthe best way to do this is to do it
    CVR> randomly, but assign a percentage to each queue, so you could
    CVR> weight towards faster/bigger machines and away from
    CVR> smaller/slower ones.

I /think/ this could all be accomplished with the current setup
without having to create a separate qfiles subdirectory.  Here's how:

Remember that every file in a qfile subdirectory is assigned a sha1
hash value, based currently on the message's content, the list name
and time.time().  Because of sha1's hashing properties the
distribution of files into the hash space should be effectively random
(I'm sure Tim Peters, if he were a member of this list, would chime in
with all the mathematical rigor I'm glossing over here. :)

Now, when a Runner asks its Switchboard for the list of files it
should process, it will receive the files sitting in some slice of
this hash space.  Currently, the code splits the hash space up evenly,
but there is no reason why someone couldn't subclass Switchboard to
allow for weighted hash space slices.

    CVR> Then, each one has a qrunner tihng delivering into that SMTP
    CVR> port. And for redundancy, if the server it's supposed to send
    CVR> to is down, that qrunner could requeue to the other outgoing
    CVR> queue(s), so a down machine wouldn't affect you.

So now what you do is subclass OutgoingRunner so that it knows about
all the parallelism and redundancy you want to build in, i.e.  the
assignments of hash space slices to SMTP ports.  Because each
OutgoingRunner will only process the files in its hash space slice,
you don't have to worry about any locking access to the files in
qfiles/out.

    CVR> Run it one weird step further out, and you could define
    CVR> outgoing queues that are NEVER used, unless the main SMTP
    CVR> queue is down. Sort of like a fallback MX.

If you really wanted to do this, then you /would/ have to define a
separate queue.  I'm not sure it's worthwhile (since the same goal
can probably be accomplish in other ways), but it's easy to do
something like:

- subclass OutgoingRunner so that if the SMTPDirect delivery failed
  due to non-responsiveness of the SMTP host, requeue the message to a
  `backoff' queue.  New queues (i.e. qfiles/subdirs) are trivial to
  define.

- Run a low-grade OutgoingRunner that only looks in qfiles/backoff for
  files to deliver.

    CVR> All of this is possible outside of mailman -- but it seems
    CVR> like it ought to be fairly easy to build into mailman, so you
    CVR> don't have to fall into DNS magic or proxies or any of the
    CVR> stuff we talked about last week...

I agree that it should be easy to do, and claim that the current
architecture could support it, if the sufficiently motivated Python
hacker were willing to spend a couple of hours on it.  I don't think
it's of general use to the majority of Mailman sites, so I don't
intend to implement it myself.  As always, patches are welcome. :)

    >> 3) Different qrunners can be assigned different priorities
    >> (i.e. you can run your incoming posts and MTA-bound queues more
    >> often then your archiver, nntpd, or command processing queues).

    CVR> Can you define bounce processing to time the server is
    CVR> otherwise idle?

Currently, bounce processing is tied in with the CommandRunner, but it
probably makes a lot more sense to split these into separate queues.
Given the current semantics (that bounces go to -admin, which may also
receive legitimate correspondence) some non-bounce email might get
delayed delivery, but I'm trying to sanitize the return addresses for
Mailman generated messages, so that responses from users go to the
-owner address (which doesn't do bounce processing), while errors go
to the -admin address (which does).

As far as delaying bounce processing for system idle periods, once the
bounce runner is split from the command runner, a strategy would be to
override the default BounceRunner's _doperiodic() and _snooze()
methods so that it only wakes up and starts processing during idle
times (however that's calculated for your particular OS).

I'm convinced almost all the nuts and bolts are there for you to build
whatever erector set frankenstein you want. :)

-Barry