Re-tries for failed SMTPDirect deliveries
Last night, I added some code to queue messages that fail delivery when using SMTPDirect. What happens is this:
If a message either totally fails delivery (e.g. the smtp socket connect fails) or partial delivery fails for some, but not all, recipients, then the message is stored on the file system for a re-try later.
For every failed message, two files are created. The base name of these files is the SHA hexdigest dump of the message text. This should be nearly guaranteed unique. A new directory contains these files, called `qfiles'. The first file created is the complete plain text of the failed message. The second file is a marshal of useful information related to the failed delivery. This contains the listname and the failed recip list along with a few other moderately useful bits of info.
There's a new cron script called `qrunner' which cruise the files in qfiles. It claims a lock (to prevent multiple qrunner processes) and then goes through each file it finds, attempting redelivery. If there are any problems reading a qfile file, it skips it for next time (assumes it's a transient problem with the file, but logs a message). When qrunner notices that the message has been handed off the the smtp daemon for all outstanding recipients, it deletes the two message files.
I've moderately tested this stuff with total delivery failure by shutting off my smtp daemon, attempting some deliveries, turning it back on and running qrunner. I don't have the time right now to test partial delivery failures, but I still claim that without DSN support, these will be unlikely. Hopefully some of you can help look at this.
I'm about to check all this stuff in. Let me know what you think. -Barry
On Tue, Mar 28, 2000 at 12:12:00PM -0500, Barry A. Warsaw wrote:
I'm about to check all this stuff in. Let me know what you think.
I'll see if I can do some checks on partial delivery failure tomorrow. (I really need to get myself a seperate box for testing ;)
But, assides from delivery, it might be useful to store messages which failed elsewhere in the pipeline too; messages in the archive pipe, for instance, or the usenet pipe. It can currently happen, for instance because of a deadlock, that messages just get lost. I haven't looked at the new code yet, but imho it shouldn't be too hard to push messages back into those pipelines, assuming they fail 'cleanly' (and not with files half-written or some such.)
(Then again, I haven't seen failures at all, yet, so I'm not too worried for myself.)
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
"TW" == Thomas Wouters <thomas@xs4all.net> writes:
TW> But, assides from delivery, it might be useful to store
TW> messages which failed elsewhere in the pipeline too; messages
TW> in the archive pipe, for instance, or the usenet pipe. It can
TW> currently happen, for instance because of a deadlock, that
TW> messages just get lost. I haven't looked at the new code yet,
TW> but imho it shouldn't be too hard to push messages back into
TW> those pipelines, assuming they fail 'cleanly' (and not with
TW> files half-written or some such.)
That's actually a good idea. I think a wrapper around the pipeline loop, perhaps using a bare try/except (hmm...) is the way to go. What you'd probably have to do is have a checklist of delivery modules so you know 1) which ones you wanted to send the message through; 2) which ones failed. And then to know what the disposal is for a message that failed at a particular step. Definitely more complicated, but worth thinking about. Robustifying message delivery should be very high on the list, but for 2.0 final we'll have to find a happy compromise.
TW> (Then again, I haven't seen failures at all, yet, so I'm not
TW> too worried for myself.)
Me neither! :)
-Barry
On Tue, Mar 28, 2000 at 03:00:18PM -0500, Barry A. Warsaw wrote:
"TW" == Thomas Wouters <thomas@xs4all.net> writes:
TW> But, assides from delivery, it might be useful to store TW> messages which failed elsewhere in the pipeline too; messages TW> in the archive pipe, for instance, or the usenet pipe. It can TW> currently happen, for instance because of a deadlock, that TW> messages just get lost. I haven't looked at the new code yet, TW> but imho it shouldn't be too hard to push messages back into TW> those pipelines, assuming they fail 'cleanly' (and not with TW> files half-written or some such.)
That's actually a good idea. I think a wrapper around the pipeline loop, perhaps using a bare try/except (hmm...) is the way to go. What you'd probably have to do is have a checklist of delivery modules so you know 1) which ones you wanted to send the message through; 2) which ones failed. And then to know what the disposal is for a message that failed at a particular step. Definitely more complicated, but worth thinking about. Robustifying message delivery should be very high on the list, but for 2.0 final we'll have to find a happy compromise.
How about a simple try/except in those two areas ? They are pretty isolated, and you can add the try/except and restart-delivery code just after the forks those portions do. (no need for the queuerunner to fork, i guess... but it could, if necessary.)
Actually, I think I'll post a diff tomorrow morning, after I have some time to think 'bout it ;) I already see one problem though: the new code eats the unixfrom line the same way moderation does, screwing up the archives:
# calculate a unique name for this file
text = str(msg)
filebase = sha.new(text).hexdigest()
msgfile = os.path.join(mm_cfg.QUEUE_DIR, filebase + '.msg')
'str(msg)' will not dump the unixfrom line, (unless you want to fix this in Mailman.Message.Message) so you need to use 'text = msg.unixfrom + str(msg)'. See the patch i sent uhm, sometime this weekend. Also, one of the comments in qrunner seems to be too literally copy/pasted from cron/gate_news ;)
Rgdrs,
Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Tue, Mar 28, 2000 at 11:14:00PM +0200, Thomas Wouters wrote:
On Tue, Mar 28, 2000 at 03:00:18PM -0500, Barry A. Warsaw wrote:
[ about the new queueing of failed message, and implementing that also in ] [ ToArchive and ToUsenet ]
Actually, I think I'll post a diff tomorrow morning, after I have some time to think 'bout it ;)
Well, I didn't make next morning, and I'm still thinking about it. Should it be integrating with the pipeline architecture, or module-specific ? I mean, it could be done two ways:
- inside each 'process' function for every module that might want to requeue, in the form of
def process(mlist, msg): try: <original code here> except TemporaryFailure: Utils.queue_message(mlist, msg, re-injection point)
and a
def reprocess(mlist, msg): <code that reinjects message>
most process()es can probably just call reprocess() after some basic checking, forking, message-header-editing, etc. reprocess() should raise TemporaryFailure, but not catch it itself -- the queue runner should catch it, and update the pickled state for that message.
- inside the pipeline structure, in the pipeline delivery.
This would require all handlers to have a reprocess() function, but most (those that will never raise TemporaryFailure) can have it just 'pass'. (Or perhaps just leave them out... that would raise AttributeError when the impossible happens, instead of silently vanishing messages)
The pipeline itself would catch TemporaryFailure, and queue the messages not only with the list, message and what pipeline segment it broke at, but also the rest of the pipeline still to be traversed. Might prove a bit more tricky, but it's a lot more elegant if more than a few modules support the queueing interface :P
Comments welcome, but I'm off for a long weekend Rome, I wont be back until tuesday, and I wont read my mail in between ;)
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
I'm trying to reach closure on this unixfrom issue. Here's my take from scanning the back archives:
The standard Python library module mailbox.py was dropping the unixfrom lines. This didn't have much of real effect on Mailman, but Guido checked in a fix anyway. We can ignore this.
The unixfrom lines were not getting properly included in held messages, which broke the Pipermail archiver.
The unixfrom lines were also not getting stored in the qfiles/*.msg files for delivery pipeline failures.
It seems to me that both of the last two problems are best fixed by patching the Mailman.Message.Message.__str__() method as given below. I believe in one of Thomas Wouters last followups on the subject he suggests this patch. It appears to fix the problem with approved held messages not getting properly archived.
This is the patch I intend to check in to fix the problem. If I've missed something please let me know.
-Barry
-------------------- snip snip -------------------- Index: Message.py
RCS file: /cvsroot/mailman/mailman/Mailman/Message.py,v retrieving revision 1.29 diff -c -r1.29 Message.py *** Message.py 2000/05/08 22:23:17 1.29 --- Message.py 2000/05/31 18:14:57
*** 55,62 **** self.body = self.fp.read()
def __str__(self):
! # TBD: should this include the unixfrom? ! return string.join(self.headers, '') + '\n' + self.body
def GetSender(self, use_envelope=None):
"""Return the address considered to be the author of the email.
--- 55,61 ---- self.body = self.fp.read()
def __str__(self):
! return self.unixfrom + string.join(self.headers, '') + '\n' + self.body
def GetSender(self, use_envelope=None):
"""Return the address considered to be the author of the email.
"BAW" == Barry A Warsaw <bwarsaw@python.org> writes:
BAW> It seems to me that both of the last two problems are best
BAW> fixed by patching the Mailman.Message.Message.__str__()
BAW> method as given below.
Except that breaks postings :(
SMTPDirect.py uses str(msg) too and inclusion of From_ in the body of the message seems to give at least Postfix all manner of willies. Maybe the thing to do is include another Message method which returns just the headers and body, sans the unixfrom, and use this in SMTPDirect and Sendmail?
-Barry
"BAW" == Barry A Warsaw <bwarsaw@python.org> writes:
BAW> SMTPDirect.py uses str(msg) too and inclusion of From_ in the
BAW> body of the message seems to give at least Postfix all manner
BAW> of willies. Maybe the thing to do is include another Message
BAW> method which returns just the headers and body, sans the
BAW> unixfrom, and use this in SMTPDirect and Sendmail?
<bling!>
Is it too much of a kludge to make __repr__() return the entire message, including the From_ header, and make __str__() return just the rfc822 headers and body? I.e.
repr(msg) == msg.unixfrom + str(msg)
I think this does the trick, as long as ListAdmin and Message.Enqueue both use repr(msg) instead of str(msg). Sendmail and SMTPDirect continue to use str(msg).
I've tested this and it all appears to work.
-Barry
On Wed, May 31, 2000 at 03:36:57PM -0400, Barry A. Warsaw wrote:
"BAW" == Barry A Warsaw <bwarsaw@python.org> writes:
BAW> SMTPDirect.py uses str(msg) too and inclusion of From_ in the BAW> body of the message seems to give at least Postfix all manner BAW> of willies. Maybe the thing to do is include another Message BAW> method which returns just the headers and body, sans the BAW> unixfrom, and use this in SMTPDirect and Sendmail?
Is it too much of a kludge to make __repr__() return the entire message, including the From_ header, and make __str__() return just the rfc822 headers and body? I.e.
repr(msg) == msg.unixfrom + str(msg)
Yeah, this works. My initial reserve against making str(msg) return the unixfrom line as well was that it broke 'compatibility' with the regular rfc822.Message. The SMTPDirect breakage shows that, I guess ;-) Using repr is a good idea, I think, but it's missing one thing: if unixfrom is empty, the mailbox will still be fawlty. I think __repr__ should reproduce a .unixfrom line if it's missing. I'll post a patch later today, I have to run off to a huge sale at the local bookstore ;-)
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
"TW" == Thomas Wouters <thomas@xs4all.net> writes:
TW> Yeah, this works. My initial reserve against making str(msg)
TW> return the unixfrom line as well was that it broke
TW> 'compatibility' with the regular rfc822.Message.
I think the only place where concrete rfc822.Message objects are used (as opposed to Mailman.Message.Message objects or derived), is in the bounce detector. At least, everything else /should/ use Mailman's own Message class or one derived from there. So I'm not worried about b/w compatibility.
TW> Using repr is a good idea, I think, but it's missing one
TW> thing: if unixfrom is empty, the mailbox will still be
TW> fawlty. I think __repr__ should reproduce a .unixfrom line if
TW> it's missing. I'll post a patch later today, I have to run off
TW> to a huge sale at the local bookstore ;-)
Cool. -Barry
participants (3)
-
Barry A. Warsaw
-
bwarsaw@python.org
-
Thomas Wouters