[Email-SIG] Re: Generator.HeaderParsedGenerator

Tim Legant tim at catseye.net
Fri Nov 21 17:37:30 EST 2003


Barry Warsaw <barry at python.org> writes:

> On Sun, 2003-10-05 at 02:51, Jason R.Mastaler wrote:
> > Can the attached patch be considered for inclusion in email?  This
> > issue is a former mimelib tracker item, but those trackers are now
> > disabled.  I've included the previous commentary leading to the patch
> > below.  FWIW, we've been using this in TMDA successfully for months
> > now.
> 
> So I've been thinking a little bit about this recently.  I'm not sure I
> feel comfortable adding this to email 2.x/Python 2.3.  It's definitely a
> new feature that is probably not appropriate for a patch release.

Turns out it's not a complete solution anyhow.  For example, if you
build a new Message from scratch and attach the HeaderParsed Message,
it will still blow.  We do this in TMDA to generate auto-responses.  I
hacked an unhappy solution around it for now, but it needs to be
addressed "correctly".  The problem is that Generator clones itself to
flatten sub-parts and thus uses the wrong class of generator to
generate the HeaderParsed sub-part.

> But there's a deeper issue which we might want to think about for email
> 3.0.  Currently we decide how to render a message by its Content-Type
> header, but that may not be optimal.

We don't actually look at the Content-Type, in at least one case,
which is the cause of another problem Jason just discovered.  We had a
message show up with the following header field:

Content-Type: multipour alternative;
        boundary="57C41D49D2E1982C2C.B7A"

Parser._parsebody assumes that, if it finds a valid boundary, the
message is a multipart message and proceeds to parse it as such.
Generator comes along, asks for the content-type and gets
'text/plain', because Message.get_content_type() can't make any sense
out of 'multipour alternative'.  Unfortunately, Message._payload is a
list of sub-Message objects and Generator._handle_text raises a
TypeError.

> If we had to resort to the HeaderParser to parse a message, the
> Content-Type header may lie, or at least it won't accurately
> describe the algorithm we should use to flatten the message.

For any HeaderParsed Message that isn't 'text/plain', Content-Type
will definitely be lying.

> It doesn't make sense to use some other header, or change the
> Content-Type header, so I'm thinking we want individual messages to have
> some other say in how they get flattened, either via attribute setting
> or method call.  Perhaps messages should have an "effective" content
> type which, if present is used instead to determine how to flatten the
> message.

Essentially, the data structure of Message._payload must alway "win",
regardless of Content-Type, or there will be errors.  The "effective"
type is one way to implement that...

The email package has to work in two different scenarios; in some
cases, email users will want to know about the errors and in other
cases they need a valid rfc2822 message generated, as close as
possible to the broken original, yet still able to be presented in an
MUA, or re-sent, or whatever.  TMDA clearly falls into the second
category because, as a delivery agent, we can't live with dropped/lost
mail.

I know Matthew has been thinking about an entirely different parsing
framework for email 3.0.  I'm looking forward to that, hoping we can
address some of these issues there.


Tim




More information about the Email-SIG mailing list