[Mailman-Developers] Re: [Mailman-Users] Problem with qrunner and too much incoming mail

J C Lawrence claw@kanga.nu
Sat, 04 Nov 2000 09:06:39 -0800


On Sat, 4 Nov 2000 10:30:58 -0500 (EST) 
Barry A Warsaw <barry@wooz.org> wrote:

> So this new process, let's call it `bulkmail'.  Bulkmail would
> have one (probably) unix socket open to take new outgoing messages
> from qrunner.  It'd probably write them to disk as a backup so
> failures don't drop messages.  I'm thinking it would then sort
> recipients based on domains, and then it would start resolving MX
> records, caching the results.  There'd be bins for each MX
> containing pointers to the messages that need to be delivered to
> that MX.  As more messages came in for that MX, they'd be dropped
> at the end of the bin.

> Once a connetion to the MX is established, bulkmail would then
> just start delivering messages to it until the bin was emptied.
> Any i/o blocks in any of the processes will allow async* to switch
> to a different delivery channel.  We may need to do some explicit
> channel management to make sure some are not starved.

Ouch.  I really don't like this idea.  For one, it make the
processing of oubound mail from a list server unique to the rest of
the mail system both on the local host, and the local 'net (consider
smarthosts, intermittent connectivity, local domain based mail
routing, firewalling etc).  It also places needs for a long running
process which then needs to be resistant and tolerant to
unexpected/frequent shutdowns (laptops, home machines, etc).

As discussed previously amongst Chuq, Nigel and I, the needs of
large list server systems are rather different from the normal home
hobbyest requirements, but are not compleatly alien.  However, the
needs of very large list installations (cf ListServ, Egroups, or
SourceForge) are rather different yet again.  I'm not convinced of
the value in beating on Mailman to support the (comparitively rare
if high profile) very large installations when the current (much
larger and more common Mailman-wise) mid-size realm still needs
attention.  Certainly, such changes should not detract from
Mailman's current level of suitability for smaller installations.

Now that said, more intelligent handling of Mailman's outbound queue 
and its hand off to the local or a remote smarthost-like MTA could
stand considerable improvement.  I've posted a couple ideas which
would require relatively small code changes and which should
improve the situation.  However, both ideas were deliberately
attempting to minimise code impact.  

What would be a really good approach without concern for code
impact?  I suspect a modified form of the hash tree for queue
storage (cf QMail's implementation minus the silly (for this use)
inode specifics) with a slightly perverted form of your (Barry's)
long running bulkmailer to process that hash queue.  I'd tend to
make the bulkmailer actually an intermittently running item to help
support for intermittently connected nodes.  Say something like:

  Cron launches the bulkmailer.
  The bulkmailer forks N children processing the queue.
  The bulkmailer exits upon an empty queue.
  Should cron launch a new bulkmailer when the prevvious incarnation 
    hasn't exited yet, the new instance merely exits immediately.

Locking for the above is fairly simple.  Standard IPCs can be used
for the instance collision checks.  Locking on the hash queues could
be a bit intereting from a portability and performance vantage given
the fact that the list side will be attemptiong to deliver into the
same tree at the same time that deliveries are happening (no more
lock collisions please) -- which pretty much requires that locking
be on the queue-entry level rather than the hash bucket level.  Not
rocket science, just a bit finnicky.

Will this handle SourceForge?  Probably.  Certainly given a local
MTA that does no checking on mail from localhost (and possibly which
immeidately respools out to a dedicated smarthost) it would
significantly improve the current state.  Enough?  Dunno.  I'd need
more specific data on their current setup, metrics etc.

-- 
J C Lawrence                                 Home: claw@kanga.nu
---------(*)                               Other: coder@kanga.nu
http://www.kanga.nu/~claw/        Keys etc: finger claw@kanga.nu
--=| A man is as sane as he is dangerous to his environment |=--