[Email-SIG] fixing the current email module
Barry Warsaw
barry at python.org
Mon Oct 12 22:30:28 CEST 2009
On Oct 10, 2009, at 9:59 AM, Stephen J. Turnbull wrote:
> Both. I *believe* (but it needs to be checked) that in a correctly
> formed multipart MIME object (message or part), any internal structure
> is context-free within the MIME boundaries. If that is so, then
> individual parts of the object can be stored in raw form and parsed
> lazily.
I too /think/ that's correct. There are some MIME content-types that
cause parts to be related (e.g. multipart/alternative and multipart/
related), but those are all operating at a higher level.
In practice it probably makes sense to parse all the headers right
away. Content-Type has the most bearing on parsing the rest of the
stuff, so by that time you already need to parse parameters to e.g.
get the boundary. Early on I claimed that headers were so manageable
in practice that we could implement an ordered-dictionary with
duplicates as a simple list, with linear searching and nobody would
notice. I think nobody has noticed ;).
Lazy parsing of the body does make sense. You only need to parse
enough to find end boundaries, or recurse into parsing an embedded
part. This is how the parser currently works anyway.
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 832 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/email-sig/attachments/20091012/fa49bd45/attachment.pgp>
More information about the Email-SIG
mailing list