Part of RFC 822 ignored by email module
Martin Gregorie
martin at address-in-sig.invalid
Thu Jan 20 18:52:18 EST 2011
On Thu, 20 Jan 2011 17:58:36 -0500, Bob Kline wrote:
> Thanks. I'm not sure everyone would agree that it's OK to collapse
> multiple consecutive spaces into one, but I'm beginning to suspect that
> those more concerned with preserving as much as possible of the original
> message are in the minority. It sounds like my take-home distillation
> from this thread is "yes, the module ignores what the spec says about
> unfolding, but it doesn't matter." I guess I can live with that.
>
I've been doing stuff in this area with the JavaMail package, though not
as yet in Python. I've learnt that you parse the headers you can extract
values that work well for comparisons, as database keys, etc. but are not
guaranteed to let you reconstitute the original header byte for byte. If
preserving the message exactly as received the solution is to parse the
message to extract the headers and MIME parts you need for the
application to carry out its function, but keep the original, unparsed
message so you can pass it on.
The other gotcha is assuming that the MUA author read and understood the
RFCs. Very many barely glanced at RFCs and/or misunderstood them.
Consequently, if you use strict parsing you'll be surprised how many
messages get rejected for having invalid headers or MIME headers. Fot
instance, the mistakes some MUAs make when outputting To, CC and BCC
headers with multiple addresses have to be seen to be believed. If the
Python e-mail module lets you, set it to use lenient parsing. If this
isn't an option you may well find yourself having to fix up messages
before you can parse them successfully.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
More information about the Python-list
mailing list