[Email-SIG] fixing the current email module

Barry Warsaw barry at python.org
Thu Oct 8 03:17:47 CEST 2009


On Oct 3, 2009, at 1:09 PM, Timothy Farrell wrote:

> Forgive my ignorance...why does converting bytes to strings have to  
> be a mess?  Rather than having two Feedparsers, can't we just pass a  
> default encoding when instantiating a feedparser and have it read  
> from the MIME headers otherwise?  If not encoding is passed and one  
> can't be determined, simply output as bytes or try a default and  
> raise an exception if it fails.

A lot of work went into the parser the last (successful) time around  
to avoid exceptions as much as possible.  That's why Message objects  
have a .defects attribute.

I'm more okay with the APIs that are used to hand-craft or modify  
existing message to throw exceptions when something bad happens, e.g.  
an unknown charset is used.  But the parser itself should never throw  
an exception.  The use case here is:

Our MTA has dropped a message on disk and it could be deliberately  
malformed spam.  We don't know that until we parse it though, so we  
must be able to construct a reasonable message tree from the raw bytes  
we read off disk.  The defects the parser encounters are in fact  
useful information that goes into a determination of ham/spam.

The key thing here is that clients of the email package are severely  
handicapped at handling any parsing errors.  Mailman for example can't  
do much except log the error and throw the message into a 'bad'  
bucket.  Whoop-de-doo!  Nobody can do anything about it! If we can at  
least give the system a Message object with defects, the system can  
reason about it and help the human decide what to do.

The generator is probably in a similar situation.  If you hand it a  
Message object, it must generate something.  In the case of a message  
with defects, we can compromises though, such as giving up on  
idempotency, fixing MIME boundaries, substituting legal/known  
charsets, etc.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 832 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/email-sig/attachments/20091007/e6ee369e/attachment-0001.pgp>


More information about the Email-SIG mailing list