[Email-SIG] Re: Generator.HeaderParsedGenerator

Fri Nov 21 17:37:30 EST 2003

Barry Warsaw <barry at python.org> writes:

> On Sun, 2003-10-05 at 02:51, Jason R.Mastaler wrote:
> > Can the attached patch be considered for inclusion in email?  This
> > issue is a former mimelib tracker item, but those trackers are now
> > disabled.  I've included the previous commentary leading to the patch
> > below.  FWIW, we've been using this in TMDA successfully for months
> > now.
> 
> So I've been thinking a little bit about this recently.  I'm not sure I
> feel comfortable adding this to email 2.x/Python 2.3.  It's definitely a
> new feature that is probably not appropriate for a patch release.

Turns out it's not a complete solution anyhow.  For example, if you
build a new Message from scratch and attach the HeaderParsed Message,
it will still blow.  We do this in TMDA to generate auto-responses.  I
hacked an unhappy solution around it for now, but it needs to be
addressed "correctly".  The problem is that Generator clones itself to
flatten sub-parts and thus uses the wrong class of generator to
generate the HeaderParsed sub-part.

> But there's a deeper issue which we might want to think about for email
> 3.0.  Currently we decide how to render a message by its Content-Type
> header, but that may not be optimal.

We don't actually look at the Content-Type, in at least one case,
which is the cause of another problem Jason just discovered.  We had a
message show up with the following header field:

Content-Type: multipour alternative;
        boundary="57C41D49D2E1982C2C.B7A"

Parser._parsebody assumes that, if it finds a valid boundary, the
message is a multipart message and proceeds to parse it as such.
Generator comes along, asks for the content-type and gets
'text/plain', because Message.get_content_type() can't make any sense
out of 'multipour alternative'.  Unfortunately, Message._payload is a
list of sub-Message objects and Generator._handle_text raises a
TypeError.

> If we had to resort to the HeaderParser to parse a message, the
> Content-Type header may lie, or at least it won't accurately
> describe the algorithm we should use to flatten the message.

For any HeaderParsed Message that isn't 'text/plain', Content-Type
will definitely be lying.

> It doesn't make sense to use some other header, or change the
> Content-Type header, so I'm thinking we want individual messages to have
> some other say in how they get flattened, either via attribute setting
> or method call.  Perhaps messages should have an "effective" content
> type which, if present is used instead to determine how to flatten the
> message.

Essentially, the data structure of Message._payload must alway "win",
regardless of Content-Type, or there will be errors.  The "effective"
type is one way to implement that...

The email package has to work in two different scenarios; in some
cases, email users will want to know about the errors and in other
cases they need a valid rfc2822 message generated, as close as
possible to the broken original, yet still able to be presented in an
MUA, or re-sent, or whatever.  TMDA clearly falls into the second
category because, as a delivery agent, we can't live with dropped/lost
mail.

I know Matthew has been thinking about an entirely different parsing
framework for email 3.0.  I'm looking forward to that, hoping we can
address some of these issues there.

Tim