[Mailman-Developers] (long) queue problems: an analysis

Scott scott@chronis.icgroup.com
Fri, 2 Oct 1998 18:22:27 -0400


On Fri, Oct 02, 1998 at 03:55:30PM -0400, Scott wrote:
| the mailman outgoing mail queue has a number of concurrency-control issues. 

[see previous post]

| So we need a mail queuing architecture that will address all these
| issues.   

Here's an idea:

1) we alter contact_transport so that it does not try to process the
   queue anymore.  it would only deal with the delivery at hand. 

2) we create a 2-part mailqueue inside
   mm_cfg.DATA_DIR/mqueue/{active,deferred}.  when we enqueue a
   message for delivery, we put it in mqueue/active/<qfilename>.  If
   the delivery succeeds, we unlink the file.  If it fails, we rename
   the file to mqueue/deferred/<qfilename>.  All mail queue files in
   active/ will be handled by a single process under the current
   delivery mechanism, so no concurrency control is necessary for
   active/ queue files. this would involve changes to TrySMPTDelivery,
   and the installation procedure.

3) we alter OutGoingQueue.enqueueMessage so that it can handle coming
   up with unique filenames under this 2-part mail queue mechanism.

4) we alter OutGoingQueue.processQueue so that it creates a site-wide
   queue_run lock file to prevent more than one queue run from
   happening at a time.  this process will also check the active/
   queue files for files whose modification/creation time is older
   than some configurable amount of time (on the order of 1hr-1day).
   For each of these files, it will rename them to the deferred/ part
   of the queue before proceeding to process them.  These 'stale'
   queue files would only come about as a result of system crashes or
   memory errors or similar serious system related and unpredictable
   errors that can happen in the middle of an smtp transaction.

the above scheme works in theory only when run_queue uid is the owner
of all the queue files and/or root.  I believe that it is possible for
queue files to be owned by both the uid of the cgi and the uid of the
local mail delivery agent.  If this is the case, then either run_queue
will have to be run as root, or all processes creating a queue file
will have to setuid mailman before creating the file.  Are there any
preferences on which of these two approaches would be best?

the above scheme should not effect delivery rates much at all, since
the TrySMTP process would be the same except that it would have to add
a rename() operation if delivery failed.  There would be no contention
over locks for most deliveries.  deliveries that are deferred would be
handled in a sequential manner, but even that should be ok since each
message in the queue can have up to some very large number of
recipients.  (on an unrelated note - has anyone bumped up against rcpt
limits with mailers yet?) 

if there aren't any concerns over this approach, i'll go ahead and
code it -- starting monday.  should take a day or two to code and
test. 

scott