[Email-SIG] fixing the current email module

Barry Warsaw barry at python.org
Mon Oct 12 22:30:28 CEST 2009


On Oct 10, 2009, at 9:59 AM, Stephen J. Turnbull wrote:

> Both.  I *believe* (but it needs to be checked) that in a correctly
> formed multipart MIME object (message or part), any internal structure
> is context-free within the MIME boundaries.  If that is so, then
> individual parts of the object can be stored in raw form and parsed
> lazily.

I too /think/ that's correct.  There are some MIME content-types that  
cause parts to be related (e.g. multipart/alternative and multipart/ 
related), but those are all operating at a higher level.

In practice it probably makes sense to parse all the headers right  
away.  Content-Type has the most bearing on parsing the rest of the  
stuff, so by that time you already need to parse parameters to e.g.  
get the boundary.  Early on I claimed that headers were so manageable  
in practice that we could implement an ordered-dictionary with  
duplicates as a simple list, with linear searching and nobody would  
notice.  I think nobody has noticed ;).

Lazy parsing of the body does make sense.  You only need to parse  
enough to find end boundaries, or recurse into parsing an embedded  
part.  This is how the parser currently works anyway.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 832 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/email-sig/attachments/20091012/fa49bd45/attachment.pgp>


More information about the Email-SIG mailing list