Automated mail bounce handling.
alanmk at hotmail.com
Fri Dec 3 13:53:55 CET 2004
I'm writing an application that sends out emails, for workflow-item
I am using a VERP-style addressing mechanism, whereby I send each
message from an uniquely generated email address, so that I can relate
bounces, notification failures, etc, to the original address to which
the email was sent.
Problem is differentiating between the different types of message that
can come back to that address. For example, if the message was
undeliverable for "permanent" reasons, e.g. user moved on, invalid email
address, etc, then I get back an email to the unique address, which
needs to be parsed in order to find the reason for failure. But I also
get back vacation emails to the same address, e.g. "I'm out of the
office at the moment, I'll read your email when I get back". The former
should be recognised by the application, so that appropriate action can
be taken, i.e. ask admin for a new address. But the latter should be
essentially ignored, because it doesn't affect whether the recipient
actually received the email.
I've been researching a little, and found the following approaches:-
1. RFC 1839, which specifies header values giving easy-to-parse
reason-codes for the return mail. But I'm uncertain as to how widely
supported RFC 1839 is "in the wild".
2. The mailman approach, which is essentially the linear application of
algorithmic matchers, each of which is given a chance to recognise
bounced mails, in order to determine the nature of the bounce. AFAICT,
this method involves a significant amount of coding, as it requires
writing a new matcher for each MTA/MUA in existence (surprise, surprise,
they all do it slightly differently). I see that the mailman
distribution comes with a couple dozen matchers, which is obviously far
from complete. I'd prefer not to go down this path, since I could end up
writing hundreds of matchers, as I incrementally discover the different
styles that real-world M[T|U]As use. And mailman is GPL, which is a
problem in this case.
I am trying to think of more robust and less costly (in coding time)
approaches. Maybe some form of text-matching algorithm, such as
1. Bayesian classification?
2. Keyword recognition?
I'd be grateful for any pointers or suggestions for existing python
solutions to this problem.
email alan: http://xhaus.com/contact/alan
More information about the Python-list