
A.M. Kuchling wrote:
How should I write the code to extract the ID? Looking through the bounce test messages, there are various formats so we'll need several functions, similar to how there are several bounced-address parsers in Mailman.Bouncers. Should I:
I don't think we do.
I ran the following
import os import re import email
hre = re.compile('^>?\s*message-id:\s*(<.*>)', re.IGNORECASE) for f in os.listdir('.'): if not f.endswith('.txt'): continue msg = email.message_from_file(open(f)) messageid = None inheaders = True for line in msg.as_string().splitlines(): if inheaders: if line == '': inheaders = False continue mo = hre.search(line) if mo: messageid = mo.group(1) break print '%s: %s' % (f, messageid)
in current Mailman's test/bounces/ directory which contains 86 DSNs. Of those 86, 12 have no message id for the original message. Of the remaining 74, all message ids are found with the above.
If the re is changed to
hre = re.compile('^message-id:\s*(<.*>)', re.IGNORECASE)
73 of the 74 are found. llnl_01.txt has the 'original message' quoted with '>' characters. A few mesages have the messsage id in a report section with leading whitespace, but they all have it later as well without leading whitespace.
In any case, I think the
hre = re.compile('^>?\s*message-id:\s*(<.*>)', re.IGNORECASE)
re will likely find anything to be found and is unlikely to find false hits.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan