[Mailman-Developers] Some refinements coming

bwarsaw@python.org bwarsaw@python.org
Thu, 4 May 2000 18:41:21 -0400 (EDT)

Some time over the next few days I'm going to check in a bunch of
related refinements which should make message delivery much more
robust than it is in 2.0beta2.  In brief, this expands on the
mechanism that SMTPDirect.py used to queue up files for which there
were delivery failures to some or all recipients.

For example, here's a scenario that could cause messages under beta2
to get dropped, but which will be handled more gracefully in beta3.
Say there's some long running process that is claiming a list lock,
e.g. bin/withlist.  A message arrives destined for the same list, but
the lock lifetime is fairly long and scripts/post can't acquire the
list lock.  Maybe you're running Postfix and you hit
command_time_limit, SIGKILLing post (ouch!).  That incoming message is
lost forever, and That's Bad (tm).

What you can do now is set the list lock acquisition timeout to
something shorter than command_time_limit.  This would cause
scripts/post to get a TimeoutError on list acquisition, which it can
catch and use to the queue the message.  cron/qrunner can then attempt
redelivery the next time it runs.

The new scheme will also mean that delivery failures generated by
Sendmail.py will be queued just like for SMTPDirect.py.  Also, if
there any uncaught exceptions in a handler module, the message will be
queued instead of being lost (with a traceback showing up in
logs/errors so you can fix it and try attempt redelivery).

One of the cool things is that a message knows how far in the pipeline
it got, so when qrunner attempts redelivery, it'll pick up where it
left off before, skipping modules that succeeded the first time.  Note
though that if you change the pipeline after the message has been
queued, the message will not flow through the new pipeline.

These changes touch many pieces of Mailman and I wish I didn't have to
do this so deeply into the beta cycle, but I think it's really really
important to make sure that messages never get lost.  Currently, the
API to delivery modules has changed (which also sucks for a beta
release), but I think I'm going to fix that before beta3 goes out.

This stuff is also only minimally tested, so don't go updating your
production systems yet.  I'm not sure I figured out all the nuances or
entry points, but my simple tests work.  I want to check it all in so
there's a cvs record of the changes, many of which have been sitting
in my devel directory for days.  Also, it'll be nice to have more eyes
look at this stuff.

Stay tuned.