Completing the email6 API changes.

If you've read my blog (eg: on planet python), you will be aware that I dedicated August to full time email package development. At the beginning of the month I worked out a design proposal for the remaining API additions to the email package, dealing with handling message bodies in a more natural way. I posted this to the email-sig, and got...well, no objections. Barry Warsaw did review it, and told me he had no issues with the overall design, but also had no time for a detailed review. Since one way to see if a design holds together is to document and code it, I decided to go ahead and do so. This resulted in a number of small tweaks, but no major changes. I have at this point completed the coding. You can view the whole patch at: http://bugs.python.org/issue18891 which also links to three layered patches that I posted as I went along, if you prefer somewhat smaller patches. I think it would be great if I could check this in for alpha2. Since it is going in as an addition to the existing provisional code, the level of review required is not as high as for non-provisional code, I think. But I would certainly appreciate review from anyone so moved, since I haven't gotten any yet. Of course, if there is serious bikeshedding about the API, I won't make alpha2, but that's fine. The longer term goal, by the way, is to move all of this out of provisional status for 3.5. This code finishes the planned API additions for the email package to bring it fully into the world of Python3 and unicode. It does not "fix" the deep internals, which could be a future development direction (but probably only after the "old" API has been retired, which will take a while). But it does make it so that you can use the email package without having to be a MIME expert. (You can't get away with *no* MIME knowledge, but you no longer have to fuss with the details of the syntax.) To give you the flavor of how the entire new provisional API plays together, here's how you can build a complete message in your application: from email.message import MIMEMessage from email.headerregistry import Address fullmsg = MIMEMessage() fullmsg['To'] = Address('Foö Bar', 'fbar@example.com') fullmsg['From'] = "mè <me@example.com>" fullmsg['Subject'] = "j'ai un problème de python." fullmsg.set_content("et la il est monté sur moi et il commence" " a m'étouffer.") htmlmsg = MIMEMessage() htmlmsg.set_content("<p>et la il est monté sur moi et il commence" " a m'étouffer.</p><img src='image1' />", subtype='html') with open('python.jpg', 'rb') as python: htmlmsg.add_related(python.read(), 'image', 'jpg', cid='image1' disposition='inline') fullmsg.make_alternative() fullmsg.attach(htmlmsg) with open('police-report.txt') as report: fullmsg.add_attachment(report.read(), filename='pölice-report.txt', params=dict(wrap='flow'), headers=( 'X-Secret-Level: top', 'X-Authorization: Monty')) Which results in: >>> for line in bytes(fullmsg).splitlines(): >>> print(line) b'To: =?utf-8?q?Fo=C3=B6?= Bar <fbar@example.com>' b'From: =?utf-8?q?m=C3=A8?= <me@example.com>' b"Subject: j'ai un =?utf-8?q?probl=C3=A8me?= de python." b'MIME-Version: 1.0' b'Content-Type: multipart/mixed; boundary="===============1710006838=="' b'' b'--===============1710006838==' b'Content-Type: multipart/alternative; boundary="===============1811969196=="' b'' b'--===============1811969196==' b'Content-Type: text/plain; charset="utf-8"' b'Content-Transfer-Encoding: 8bit' b'' b"et la il est mont\xc3\xa9 sur moi et il commence a m'\xc3\xa9touffer." b'' b'--===============1811969196==' b'MIME-Version: 1.0' b'Content-Type: multipart/related; boundary="===============1469657937=="' b'' b'--===============1469657937==' b'Content-Type: text/html; charset="utf-8"' b'Content-Transfer-Encoding: quoted-printable' b'' b"<p>et la il est mont=C3=A9 sur moi et il commence a m'=C3=A9touffer.</p><img =" b"src=3D'image1' />" b'' b'--===============1469657937==' b'MIME-Version: 1.0' b'Content-Type: image/jpg' b'Content-Transfer-Encoding: base64' b'Content-Disposition: inline' b'Content-ID: image1' b'' b'ZmFrZSBpbWFnZSBkYXRhCg==' b'' b'--===============1469657937==--' b'--===============1811969196==--' b'--===============1710006838==' b'MIME-Version: 1.0' b'X-Secret-Level: top' b'X-Authorization: Monty' b'Content-Transfer-Encoding: 7bit' b'Content-Disposition: attachment; filename*=utf-8''p%C3%B6lice-report.txt" b'Content-Type: text/plain; charset="utf-8"; wrap="flow"' b'' b'il est sorti de son vivarium.' b'' b'--===============1710006838==--' If you've used the email package enough to be annoyed by it, you may notice that there are some nice things going on there, such as using CTE 8bit for the text part by default, and quoted-printable instead of base64 for utf8 when the lines are long enough to need wrapping. (Hmm. Looking at that I see I didn't fully fix a bug I had meant to fix: some of the parts have a MIME-Version header that don't need it.) All input strings are unicode, and the library takes care of doing whatever encoding is required. When you pull data out of a parsed message, you get unicode, without having to worry about how to decode it yourself. On the parsing side, after the above message has been parsed into a message object, we can do: >>> print(fullmsg['to'], fullmsg['from']) Foö Bar <"fbar@example.com"> mè <me@example.com> >>> print(fullmsg['subject']) j'ai un problème de python. >>> print(fullmsg['to'].addresses[0].display_name) Foö Bar >>> print(fullmsg.get_body(('plain',)).get_content()) et la il est monté sur moi et il commence a m'étouffer. >>> for part in fullmsg.get_body().iter_parts(): ... print(part.get_content()) <p>et la il est monté sur moi et il commence a m'étouffer.</p><img src='image1' /> b'fake image data\n' >>> for attachment in fullmsg.iter_attachments(): ... print(attachment.get_content()) ... print(attachment['Content-Type'].params()) il est sorti de son vivarium. {'charset': 'utf-8', 'wrap': 'flow'} Of course, in a real program you'd actually be checking the mime types via get_content_type() and friends before getting the content and doing anything with it. Please read the new contentmanager module docs in the patch for full details of the content management part of the above API (and the headerregistry docs if you want to review the (new in 3.3) header parsing part of the above API). Feedback welcome, here or on the issue. --David PS: python jokes courtesy of someone doing a drive-by on #python-dev the other day.

R. David Murray writes:
But I would certainly appreciate review from anyone so moved, since I haven't gotten any yet.
I'll try to make time for a serious (but obviously partial) review by Monday. I don't know if this is "serious" bikeshedding, but I have a comment or two on the example:
from email.message import MIMEMessage from email.headerregistry import Address fullmsg = MIMEMessage() fullmsg['To'] = Address('Foö Bar', 'fbar@example.com') fullmsg['From'] = "mè <me@example.com>" fullmsg['Subject'] = "j'ai un problème de python."
This is very nice! *I* *love* it. But (sorry!) I worry that it's not obvious to "naive" users. Maybe it would be useful to have a Message() factory which has one semantic difference from MIMEMessage: it "requires" RFC 822-required headers (or optionally RFC 1036 for news). Eg: # This message will be posted and mailed # These would conform to the latest Draft Standards # and be DKIM-signed fullmsg = Message('rfc822', 'rfc1036', 'dmarc') I'm not sure how "required" would be implemented (perhaps through a .validate() method). So the signature of the API suggested above is Message(*validators, **kw). For MIMEMessage, I think I prefer the name "MIMEPart". To naive users, the idea of MIMEMessages containing MIMEMessages is a bit disconcerting, except in the case of multipart/digest, I think.
fullmsg.set_content("et la il est monté sur moi et il commence" " a m'étouffer.") htmlmsg = MIMEMessage() htmlmsg.set_content("<p>et la il est monté sur moi et il commence" " a m'étouffer.</p><img src='image1' />", subtype='html')
I think I'd like to express the suite above as fullmsg.payload.add_alternative(...) fullmsg.payload.add_alternative(..., subtype='html') This would automatically convert the MIME type of fullmsg to 'multipart/alternative', and .payload to a list where necessary. .set_content() would be available but it's "dangerous" (it could replace an arbitrary multipart -- this would be useful operation to replace it with a textual URL or external-body part). Aside: it occurs to me that the .payload attribute (and other such attributes) could be avoided by the device of using names prefixed by ":" such as ":payload" as keys: "fullmsg[':payload']" since colon is illegal in field names (cf RFC 5322). Probably I've just been writing too much Common Lisp, though.<wink/> I'm not sure whether "payload" is a better name than "content" for that attribute. Now the suite
with open('python.jpg', 'rb') as python: htmlmsg.add_related(python.read(), 'image', 'jpg', cid='image1' disposition='inline') fullmsg.make_alternative() fullmsg.attach(htmlmsg)
becomes just with open('python.jpg', 'rb') as python: fullmsg.payload['text/html'].add_related(...) At this point, "fullmsg.add_related()" without the .payload attribute would be an error, unless a "insertPart=True" keyword argument were present. With "insertPart=True", a new top-level multipart/related would be interposed with the existing multipart/alternative as its first child, and the argument of add_related as the second. Maybe that's too complicated, but I suspect it's harder for people who think of MIME messages as trees, than for people who think of messages as documents and don't want to hear about mimes other than Marcel Marceau.<wink/> The indexing of the .payload attribute by part type is perhaps too smart for my own good, haven't thought carefully about it. It's plausible, though, since a message with multiple parts of the same type can only have one displayed -- normally that shouldn't happen. OTOH, this wouldn't work without modification for multipart/mixed or multipart/related. Could use Yet Another Keyword Argument, maybe. (BTW, it's really annoying when the text/plain part refers to images etc that are attached only to the text/html part. AFAICT from RFC 2387 it ought to be possible to put the multipart/related part at the top so both text/html and text/plain can refer to it.)
with open('police-report.txt') as report: fullmsg.add_attachment(report.read(), filename='pölice-report.txt', params=dict(wrap='flow'), headers=( 'X-Secret-Level: top', 'X-Authorization: Monty'))
I can't find an RFC that specifies a "wrap" parameter to text/plain. Do you mean RFC 2646 'format="flowed"' here? (A "validate" method could raise a warning on unregistered parameters.)
(Hmm. Looking at that I see I didn't fully fix a bug I had meant to fix: some of the parts have a MIME-Version header that don't need it.)
Another reason why the top-level part should be treated differently in the API. Steve

On Sat, 31 Aug 2013 18:57:56 +0900, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
R. David Murray writes:
But I would certainly appreciate review from anyone so moved, since I haven't gotten any yet.
I'll try to make time for a serious (but obviously partial) review by Monday.
Thanks.
I don't know if this is "serious" bikeshedding, but I have a comment or two on the example:
Yeah, you engaged in some serious bikeshedding there ;) I like the idea of a top level part that requires the required headers, and I agree that MIMEPart is better than MIMEMessage for that class. Full validation is something that is currently a "future objective". There's infrastructure to do it, but not all of the necessary knowledge has been coded in yet. I take your point about the relationship between related and alternative not being set in stone. I'll have to think through the consequences of that, but I think it is just a matter of removing a couple error checks and updating the documentation. I'll also have to sit and think through your other ideas (the more extensive bikeshedding :) before I can comment, and I'm heading out to take my step-daughter to her freshman year of college, so I won't be able to do thorough responses until tomorrow. --David

R. David Murray writes:
Full validation is something that is currently a "future objective".
I didn't mean it to be anything else. :-)
There's infrastructure to do it, but not all of the necessary knowledge has been coded in yet.
Well, I assume you already know that there's no way that can ever happen (at least until we abandon messaging entirely): new RFCs will continue to be published. So it needs to be an extensible mechanism, a "pipeline" of checks (Barry would say a "chain of rules", I think). Enjoy your trip!

On Sun, 01 Sep 2013 00:18:59 +0900, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
R. David Murray writes:
Full validation is something that is currently a "future objective".
I didn't mean it to be anything else. :-)
There's infrastructure to do it, but not all of the necessary knowledge has been coded in yet.
Well, I assume you already know that there's no way that can ever happen (at least until we abandon messaging entirely): new RFCs will continue to be published. So it needs to be an extensible mechanism, a "pipeline" of checks (Barry would say a "chain of rules", I think).
My idea was to encode as much of the current known rules as as we have the stomach for, and to have a validation flag that you turn on if you want to check your message against those standards. But without that flag the code allows you to set arbitrary parameters and headers. As you say, an extensible mechanism for the validators is a good idea. So I take it back that the infrastructure is in place, since extensibility doesn't exist for that feature yet. --David

On Sat, 31 Aug 2013 18:57:56 +0900, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
R. David Murray writes:
But I would certainly appreciate review from anyone so moved, since I haven't gotten any yet.
I'll try to make time for a serious (but obviously partial) review by Monday.
I don't know if this is "serious" bikeshedding, but I have a comment or two on the example:
from email.message import MIMEMessage from email.headerregistry import Address fullmsg = MIMEMessage() fullmsg['To'] = Address('Foő Bar', 'fbar@example.com') fullmsg['From'] = "mé¨ <me@example.com>" fullmsg['Subject'] = "j'ai un probléme de python."
This is very nice! *I* *love* it.
But (sorry!) I worry that it's not obvious to "naive" users. Maybe it would be useful to have a Message() factory which has one semantic difference from MIMEMessage: it "requires" RFC 822-required headers (or optionally RFC 1036 for news). Eg:
# This message will be posted and mailed # These would conform to the latest Draft Standards # and be DKIM-signed fullmsg = Message('rfc822', 'rfc1036', 'dmarc')
I'm not sure how "required" would be implemented (perhaps through a .validate() method). So the signature of the API suggested above is Message(*validators, **kw).
Adding new constructor arguments to the existing Message class is possible. However, given the new architecture, the more logical way to do this is to put it in the policy. So currently the idea would be for this to be spelled like this: fullmsg = Message(policy=policy.SMTP+policy.strict) Then what would happen is that when the message is serialized (be it via str(), bytes(), by passing it to smtplib.sendmail or smtplib.sendmessage, or by an explicit call to a Generator), an error would be raised if the minimum required headers are not present. As I said in an earlier message, currently there's no extensibility mechanism for the validation. If the parser recognizes a defect, whether or not an error is raised is controlled by the policy. But there's no mechanism for adding new defect checking that the parser doesn't already know about, or for issues that are not parse-time defects. (There is currently one non-parsing defect for which there is a custom control: the maximum number of headers of a given type that are allowed to be added to a Message object.) So we need some way to add additional constraints as well. Probably a list of validation functions that take a Message/MIMEPart as the argument and do a raise if they want to reject the message. The tricky bit is that currently raise_on_defect means you get an error as soon as a (parsing) defect is discovered. Likewise, if max_count is being enforced for headers, the error is raised as soon as the duplicate header is added. Generating errors early when building messages was one of or original design goals, and *only* detecting problems via validators runs counter to that unless all the validators are called every time an operation is performed that modifies a message. Maybe that would be OK, but it feels ugly. For the missing header problem, the custom solution could be to add a 'headers' argument to Message that would allow you to write: fullmsg = Message(header=( Header('Date', email.utils.localtime()), Header('To', Address('Fred', 'abc@xyz.com')), Header('From', Address('Sally, 'fgd@xyz.com')), Header('Subject', 'Foo'), ), policy=policy.SMTP+policy.Strict) This call could then immediately raise an error if not all of the required headers are present. (Header is unfortunately not a good choice of name here because we already have a 'Header' class that has a different API). Aside: I could also imagine adding a 'content' argument that would let you generate a simple text message via a single call...which means you could also extend this model to specifying the entire message in a single call, if you wrote a suitable content manager function for tuples: fullmsg = Message( policy=policy.SMTP+policy.Strict, header=( Header('Date', datetime.datetime.now()), Header('To', Address('Fred', 'abc@xyz.com')), Header('From', Address('Sally, 'fgd@xyz.com')), Header('Subject', 'Foo'), ), content=( ( 'This is the text part', ( '<p>Here is the html</p><img src="image1" \>', {'image1': b'image data'}, ), ), b'attachment data', ) But that is probably a little bit crazy...easier to just write a custom function for your application that calls the message building APIs. Well, anyway, coming back from that little tangent, it seems to me in summary that if we want to raise errors as soon as possible, we need to add custom mechanisms for the error detection as we figure out each class of error we want to handle, so that the validation gets done right away without adding too much overhead. For parsing defects, that means if you want to control which ones are raised you hook handle_defect and make your decision there (there's no way to *add* parsing defects via that hook). For duplicate headers, you hook header_max_count (or set max_count on your custom header classes). For missing headers we *could* introduce something like the above headers argument to Message, which would be required in 'strict' mode. But clearly we should also have a general "validation function" list on the policy, where all the functions would be called before serialization and have an opportunity to reject the message by raising errors. That would provide a generalized, if heavier handed, hook to use when there is no specific hook that provides extendability. For missing headers I'm inclined to use a serialization-time check rather than the Message constructor check I speculated about above. My logic is that the message isn't complete until you are ready to send it, so generating an error earlier probably isn't wanted in many cases, nor will generating it at serialization time loose you much debugging information. So I'm inclined to implement it as a default validator in the proposed list of serialization time validators. It is still probably worthwhile providing a utility function for creating a message in a single call. I think there's even an open issue in the tracker for that. But I'm inclined to postpone that until 3.5.
For MIMEMessage, I think I prefer the name "MIMEPart". To naive users, the idea of MIMEMessages containing MIMEMessages is a bit disconcerting, except in the case of multipart/digest, I think.
Or message/rfc822: [...] try: smtplib.sendmessage(msg) except Exception as exc: errormsg = MIMEMEssage() errormsg['To'] = msg['sender'] or msg['from'] errormsg['From'] = 'robot@mydomain.org' errormsg.set_content("I'm sorry, sending failed: {}".format(exc)) errormsg.add_attachment(orig_msg, disposition='inline') smtplib.sendmessage(errormsg) (Terrible code, but you get the idea :)
fullmsg.set_content("et la il est monté sur moi et il commence" " a m'étouffer.") htmlmsg = MIMEMessage() htmlmsg.set_content("<p>et la il est monté sur moi et il commence" " a m'étouffer.</p><img src='image1' />", subtype='html')
I think I'd like to express the suite above as
fullmsg.payload.add_alternative(...) fullmsg.payload.add_alternative(..., subtype='html')
This would automatically convert the MIME type of fullmsg to 'multipart/alternative', and .payload to a list where necessary. .set_content() would be available but it's "dangerous" (it could replace an arbitrary multipart -- this would be useful operation to replace it with a textual URL or external-body part).
Having an attribute with methods that affect the parent object is not a natural act in Python. (Is it in OO in general?) But aside from that, I'm not sure I see the point of a two level name here. You are still talking about mutating the message model object, which is exactly what is going on in my example: if you call add_alternative on a text part, it gets turned into a multipart/alternative with the first part being the text part and the second part being the thing you just added. That's in the docs but I didn't refer to it explicitly in my example. Does that make things clearer? It is a good point about unintentionally overwriting the existing contents. That would be simple to fix: make it an error to call set_content if payload is not None. Possibly a 'clear' method would then be useful, especially since so many other Python datatypes have one.
Aside: it occurs to me that the .payload attribute (and other such attributes) could be avoided by the device of using names prefixed by ":" such as ":payload" as keys: "fullmsg[':payload']" since colon is illegal in field names (cf RFC 5322). Probably I've just been writing too much Common Lisp, though.<wink/>
I think maybe so :). I'd rather write 'msg.payload' than 'msg[:payload']. So I don't have a motivation to "avoid" those attributes. (Nor would I find msg[':payload'].sommethod mutating the parent object any less surprising.)
I'm not sure whether "payload" is a better name than "content" for that attribute.
Now the suite
with open('python.jpg', 'rb') as python: htmlmsg.add_related(python.read(), 'image', 'jpg', cid='image1' disposition='inline') fullmsg.make_alternative() fullmsg.attach(htmlmsg)
becomes just
with open('python.jpg', 'rb') as python: fullmsg.payload['text/html'].add_related(...)
This doesn't work, though, because you could (although you usually won't) have more than one 'text/html' part in a single multipart. The reason behind the structure of my example is to avoid having to think about indexing into already existing parts. It *could* have been written: fullmsg.add_alternative(...) fullmsg.add_alternative(...., subtype='html') with open('python.jpg', 'rb') as python: fullmsg.get_payload()[1].add_related(...) But that means you have to think about payload lists and order of parts and figure out the right index to use. But perhaps some people will find that more congenial than the make_alternative() dance, and I should probably make sure that I include that alternative in the examples (that I haven't yet written for the docs). (I dislike the 'get_payload' API, by the way.) The awkwardness of the code in my example arises from the fact that a Message object is a valid thing to want to use as content (see message/rfc822 above), so I can't have the code just assume that: fullmsg.add_alternative(htmlmsg) means "make fullmsg into a multipart/alternative and attach htmlmsg", since in the logic of my API what it naturally means is "make_fullmsg into a multipart/alternative and attach a new message/rfc822 part containing htmlmsg". Thus the make/attach dance instead. However, what I really want, as I mentioned in my original proposal but have dropped from my 3.4 proposed code addition, is: with open('python.jpg', 'rb') as python: fullmsg.add_alternative( Webpage('<p>example<\p><img src=image1 \>', {'image1': python.read()} )) I dropped it from the code proposal because it seems to me that we should give the community an opportunity to experiment with the content manager interface before we decide what the best stuff is to include in the stdlib.
At this point, "fullmsg.add_related()" without the .payload attribute would be an error, unless a "insertPart=True" keyword argument were present. With "insertPart=True", a new top-level multipart/related would be interposed with the existing multipart/alternative as its first child, and the argument of add_related as the second. Maybe that's too complicated, but I suspect it's harder for people who think of MIME messages as trees, than for people who think of messages as documents and don't want to hear about mimes other than Marcel Marceau.<wink/>
I *think* I see what you are suggesting, but I don't see that it is easier for the non-tree thinker than my proposal. My proposal doesn't even let you make the conversion your insertPart=True would produce. (That could indeed be a bug, so we may want a "I know what I'm doing" keyword for 'make_related' and friends).
The indexing of the .payload attribute by part type is perhaps too smart for my own good, haven't thought carefully about it. It's plausible, though, since a message with multiple parts of the same type can only have one displayed -- normally that shouldn't happen. OTOH, this wouldn't work without modification for multipart/mixed or multipart/related. Could use Yet Another Keyword Argument, maybe.
Ah, good point. But the mechanism doesn't generalize to other types (eg: you can have any number of 'image/jpg'), which would make the API quite inconsistent (you can index by 'text/html' but not by 'image/jpeg'? Wat?)
(BTW, it's really annoying when the text/plain part refers to images etc that are attached only to the text/html part. AFAICT from RFC 2387 it ought to be possible to put the multipart/related part at the top so both text/html and text/plain can refer to it.)
Is there a text part format that can actually refer to related parts? text/plain isn't one, as far as I know. As I said in another message, it is worth thinking about supporting more unusual structures. The tradeoff is not getting an error when you do something that is in most cases a mistake. Absent information about a reasonably common text format that actually embeds images, I'm inclined to keep the restriction. We can always remove it later, but we can't add it back later.
with open('police-report.txt') as report: fullmsg.add_attachment(report.read(), filename='pölice-report.txt', params=dict(wrap='flow'), headers=( 'X-Secret-Level: top', 'X-Authorization: Monty'))
I can't find an RFC that specifies a "wrap" parameter to text/plain. Do you mean RFC 2646 'format="flowed"' here?
Yes, I did. I just didn't bother to look it up (my apologies), since the point of the example was that you could add arbitrary extra parameters.
(A "validate" method could raise a warning on unregistered parameters.)
Yes, which if we want it to happen at the time the parameter is set, means we probably want a custom (extendible) mechanism for this in the policy.
(Hmm. Looking at that I see I didn't fully fix a bug I had meant to fix: some of the parts have a MIME-Version header that don't need it.)
Another reason why the top-level part should be treated differently in the API.
True. And if we add MIMEPart but keep MIMEMessage as the top level part rather than re-using the existing Message, we can make the MIMEMessage policy strict by default. --David

On 9/1/2013 3:10 PM, R. David Murray wrote:
This doesn't work, though, because you could (although you usually won't) have more than one 'text/html' part in a single multipart.
I was traveling and your original message is still unread in my queue of "things to look at later" :( I haven't caught up with old stuff yet, but am trying to stay current on current stuff... The quoted issue was mentioned in another message in this thread, though in different terms. I recall being surprised when first seeing messages generated by Apple Mail software, that are multipart/related, having a sequence of intermixed text/plain and image/jpeg parts. This is apparently how Apple Mail generates messages that have inline pictures, without resorting to use of HTML mail. Other email clients handle this relatively better or worse, depending on the expectations of their authors! Several of them treat all the parts after the initial text/html part as attachments; some of them display inline attachments if they are text/html or image/jpeg and others do not. I can't say for sure if there are other ways they are treated; I rather imagine that Apple Mail displays the whole message with interspersed pictures quite effectively, without annoying the user with attachment "markup", but I'm not an Apple Mail user so I couldn't say for sure. You should, of course, ensure that it is possible to create such a message. Whether Apple Mail does that with other embedded image/* formats, or with other text/* formats, or other non-image, non-text formats, I couldn't say. I did attempt to determine if it was non-standard usage: it is certainly non-common usage, but I found nothing in the email/MIME RFCs that precludes such usage.

This is getting off-topic IMO; we should probably take this thread to email-sig. Glenn Linderman writes:
I recall being surprised when first seeing messages generated by Apple Mail software, that are multipart/related, having a sequence of intermixed text/plain and image/jpeg parts. This is apparently how Apple Mail generates messages that have inline pictures, without resorting to use of HTML mail.
(Are you sure you mean "text/plain" above? I've not seen this form of message. And you mention only "text/html" below.) This practice (like my suggestion) is based on the conjecture that MUAs that implement multipart/related will treat it as multipart/mixed if the "main" subpart isn't known to implement links to external entities.
Other email clients handle this relatively better or worse, depending on the expectations of their authors!
Sure. After all, this is a world in which some MUAs have a long history of happily executing virus executables.
I did attempt to determine if it was non-standard usage: it is certainly non-common usage, but I found nothing in the email/MIME RFCs that precludes such usage.
Clearly RFCs 2046 and 2387 envision a fallback to multipart/mixed, but are silent on how to do it for MUAs that implement multipart/related. RFC 2387 says: MIME User Agents that do recognize Multipart/Related entities but are unable to process the given type should give the user the option of suppressing the entire Multipart/Related body part shall be. [...] Handling Multipart/Related differs [from handling of existing composite subtypes] in that processing cannot be reduced to handling the individual entities. I think that the sane policy is that when processing multipart/related internally, the MUA should treat the whole as multipart/mixed, unless it knows how links are implemented in the "start" part. But the RFC doesn't say that.
Several of them treat all the parts after the initial text/html part as attachments;
They don't implement RFC 2387 (promoted to draft standard in 1998, following two others, the earlier being RFC 1872 from 1995). Too bad for their users. But what I'm worried about is a different issue, which is how to ensure that multipart/alternative messages present all relevant content entities in both presentations. For example, the following hypothetical structure is efficient: multipart/alternative text/plain multipart/related text/html application/x-opentype-font because the text/plain can't use the font. But this multipart/alternative text/plain multipart/related text/html image/png image/png often cost the text/plain receiver a view of the images, and I don't see any way to distinguish the two cases. (The images might be character glyphs, for example, effectively a "poor man's font".) OTOH, if the message is structured multipart/related multipart/alternative text/plain text/html image/png image/png the receiver can infer that the images are related to both text/* parts and DTRT for each. Steve

On 9/1/2013 8:03 PM, Stephen J. Turnbull wrote:
This is getting off-topic IMO; we should probably take this thread to email-sig.
Probably, but you didn't :)
Glenn Linderman writes:
I recall being surprised when first seeing messages generated by Apple Mail software, that are multipart/related, having a sequence of intermixed text/plain and image/jpeg parts. This is apparently how Apple Mail generates messages that have inline pictures, without resorting to use of HTML mail.
(Are you sure you mean "text/plain" above? I've not seen this form of message. And you mention only "text/html" below.)
Yes, I'm sure it was text/plain. I may be able to access the archived discussion from a non-Python mailing list about it, to verify, if that becomes important. But now that you mention mulitpart/mixed, I'm not sure if it was multipart/related or mulitpart/mixed for the grouping MIME part. Perhaps someone with Apple Mail could produce one... probably by composing a message as text/plain, and dragging in a picture or two. The other references to text/html was in error.
This practice (like my suggestion) is based on the conjecture that MUAs that implement multipart/related will treat it as multipart/mixed if the "main" subpart isn't known to implement links to external entities.
Other email clients handle this relatively better or worse, depending on the expectations of their authors!
Sure. After all, this is a world in which some MUAs have a long history of happily executing virus executables.
I did attempt to determine if it was non-standard usage: it is certainly non-common usage, but I found nothing in the email/MIME RFCs that precludes such usage.
Clearly RFCs 2046 and 2387 envision a fallback to multipart/mixed, but are silent on how to do it for MUAs that implement multipart/related. RFC 2387 says:
MIME User Agents that do recognize Multipart/Related entities but are unable to process the given type should give the user the option of suppressing the entire Multipart/Related body part shall be. [...] Handling Multipart/Related differs [from handling of existing composite subtypes] in that processing cannot be reduced to handling the individual entities.
I think that the sane policy is that when processing multipart/related internally, the MUA should treat the whole as multipart/mixed, unless it knows how links are implemented in the "start" part. But the RFC doesn't say that.
Several of them treat all the parts after the initial text/html part as attachments;
They don't implement RFC 2387 (promoted to draft standard in 1998, following two others, the earlier being RFC 1872 from 1995). Too bad for their users.
Correct... but the MUA receiving the Apple Mail message I was talking about being a text-mostly MUA, it is probably a reasonable method of handling them.
But what I'm worried about is a different issue, which is how to ensure that multipart/alternative messages present all relevant content entities in both presentations. For example, the following hypothetical structure is efficient:
multipart/alternative text/plain multipart/related text/html application/x-opentype-font
because the text/plain can't use the font. But this
multipart/alternative text/plain multipart/related text/html image/png image/png
often cost the text/plain receiver a view of the images, and I don't see any way to distinguish the two cases. (The images might be character glyphs, for example, effectively a "poor man's font".)
Yes, that issue is handled by some text MUA by showing the image/png (or anything in such a position) as attachments. Again, being text-mostly, that might be a reasonable way of handling them. Perhaps the standard says they should be ignored, when displaying text/plain alternative.
OTOH, if the message is structured
multipart/related multipart/alternative text/plain text/html image/png image/png
the receiver can infer that the images are related to both text/* parts and DTRT for each.
With the images being treated as attachments. Or is there a syntax to allow the text/html to embed the images and the text/plain to see them as attachments? I think the text/html wants to refer to things within its containing multipart/related, but am not sure if that allows the intervening multipart/alternative.

Glenn writes:
Steve writes:
OTOH, if the message is structured
multipart/related multipart/alternative text/plain text/html image/png image/png
the receiver can infer that the images are related to both text/* parts and DTRT for each.
With the images being treated as attachments. Or is there a syntax to allow the text/html to embed the images and the text/plain to see them as attachments?
I believe the above is that syntax. But the standard doesn't say anything about this. The standard for multipart/alternative is RFC 2046, which doesn't know about multipart/related. RFC 2387 doesn't update RFC 2046, so it doesn't say anything about multipart/alternative within multipart/related, either.
I think the text/html wants to refer to things within its containing multipart/related, but am not sure if that allows the intervening multipart/alternative.
I don't see why not. But it would depend on the implementations, which we'll have to test before recommending the structure I (theoretically :-) prefer.e

On Mon, 02 Sep 2013 16:06:53 +0900, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
Glenn writes: > Steve writes:
OTOH, if the message is structured
multipart/related multipart/alternative text/plain text/html image/png image/png
the receiver can infer that the images are related to both text/* parts and DTRT for each.
With the images being treated as attachments. Or is there a syntax to allow the text/html to embed the images and the text/plain to see them as attachments?
I believe the above is that syntax. But the standard doesn't say anything about this. The standard for multipart/alternative is RFC 2046, which doesn't know about multipart/related. RFC 2387 doesn't update RFC 2046, so it doesn't say anything about multipart/alternative within multipart/related, either.
I think the text/html wants to refer to things within its containing multipart/related, but am not sure if that allows the intervening multipart/alternative.
I don't see why not. But it would depend on the implementations, which we'll have to test before recommending the structure I (theoretically :-) prefer.e
I'm still not understanding how the text/plain part *refers* to the related parts. I can understand the structure Glen found in Applemail: a series of text/plain parts interspersed with image/jpg, with all parts after the first being marked 'Contentent-Disposition: inline'. Any MUA that can display text and images *ought* to handle that correctly and produce the expected result. But that isn't what your structure above would produce. If you did: multipart/related multipart/alternative text/html text/plain image/png text/plain image/png text/plain and only referred to the png parts in the text/html part and marked all the parts as 'inline' (even though that is irrelevant in the text/html related case), an MUA that *knew* about this technique *could* display it "correctly", but an MUA that is just following the standards most likely won't. I don't see any way short of duplicating the image parts to make it likely that a typical MUA would display images for both a text/plain sequence and a text/html related part. On the other hand, my experience with MUAs is actually quite limited :) Unless there is some standard for referring to related parts in a text/plain part? I'm not aware of any, but you have much more experience with this stuff than I do. (Even text/enriched (RFC 1896) doesn't seem to have one, though of course there could be "extensions" that define both that and the font support you used as an example.) --David

I'm still not understanding how the text/plain part*refers* to the related parts. I don't think the text/plain part can refer to the related parts, but,
On 9/2/2013 2:40 PM, R. David Murray wrote: like you, am willing to be educated if there is a way; but while the text/html may be able to if things like cid: URIs can reach up a level in a given MUA, the text/plain would be left with the additional parts being attachments, methinks. This is less interesting than the technique Apple Mail uses, but more interesting than not even seeing the attached pictures. MUAs tend to be able to display what they produce themselves, but I have situations where they don't handle what other MUAs produce. One nice thing about this email6 toolkit might be the ability to produce, more easily than before, a variety of MIME combinations to exercise and test a variety of MUAs. While perhaps most of them have been tested with some obviously standard MIME combinations, I suspect most of them will produce strange results with combinations that are out of the ordinary.

On Mon, 02 Sep 2013 15:52:59 -0700, Glenn Linderman <v+python@g.nevcal.com> wrote:
MUAs tend to be able to display what they produce themselves, but I have situations where they don't handle what other MUAs produce.
One nice thing about this email6 toolkit might be the ability to produce, more easily than before, a variety of MIME combinations to exercise and test a variety of MUAs. While perhaps most of them have been tested with some obviously standard MIME combinations, I suspect most of them will produce strange results with combinations that are out of the ordinary.
Yeah, RFC compliance and other types of testing is something I want this package to be good for. The API under discussion here, though, is oriented toward people using the library for easily generating emails from their application and/or easily accessing the information from emails sent to their application. --David

R. David Murray writes:
I'm still not understanding how the text/plain part *refers* to the related parts.
Like this: "Check out this picture of my dog!" Or this: "The terms of the contract are found in the attached PDF. Please print it and sign it, then return it by carrier pigeon (attached)." With this structure multipart/alternative text/plain multipart/related text/html application/pdf application/rfc6214-transport the rendering of the text/plain part will not show evidence of the PDF at all (eg, a view/download button), at least in some of the MUAs I've tested. And it *should* not, in an RFC-conforming MUA.
I can understand the structure Glen found in Applemail: a series of text/plain parts interspersed with image/jpg, with all parts after the first being marked 'Contentent-Disposition: inline'. Any MUA that can display text and images *ought* to handle that correctly and produce the expected result. But that isn't what your structure above would produce. If you did:
multipart/related multipart/alternative text/html text/plain image/png text/plain image/png text/plain
and only referred to the png parts in the text/html part and marked all the parts as 'inline' (even though that is irrelevant in the text/html related case), an MUA that *knew* about this technique *could* display it "correctly", but an MUA that is just following the standards most likely won't.
OK, I see that now. It requires non-MIME information about the treatment of the root entity by the implementation. On the other hand, it shouldn't *hurt*. RFC 2387 explicitly specifies that at least some parts of a contained multipart/related part should be able to refer to entities related via the containing multipart/related. Since it does not mention *any* restrictions on contained root entities, I take it that it implicitly specifies that any contained multipart may make such references. But I suspect it's not implemented by most MUAs. I'll have to test.
I don't see any way short of duplicating the image parts to make it likely that a typical MUA would display images for both a text/plain sequence and a text/html related part. On the other hand, my experience with MUAs is actually quite limited :)
Unless there is some standard for referring to related parts in a text/plain part?
No, the whole point is that we MUA implementers *know* that there is no machine-parsable way to refer to the related parts in text/plain, and therefore the only way to communicate even the *presence* of the attachment in multipart/related text/plain image/jpeg; name="dog-photo.jpg" to the receiving user is to make an exception in the implementation and treat it as multipart/mixed.[1] It *does* make sense, i.e., doesn't require any information not already available to the implementation. I wonder if use of external bodies could avoid the duplication in current implementations. Probably too fragile, though. Footnotes: [1] This is conformant to the RFC, as the mechanism of "relation" is explicitly application-dependent.

On Tue, 03 Sep 2013 10:56:36 +0900, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
R. David Murray writes:
I can understand the structure Glen found in Applemail: a series of text/plain parts interspersed with image/jpg, with all parts after the first being marked 'Contentent-Disposition: inline'. Any MUA that can display text and images *ought* to handle that correctly and produce the expected result. But that isn't what your structure above would produce. If you did:
multipart/related multipart/alternative text/html text/plain image/png text/plain image/png text/plain
and only referred to the png parts in the text/html part and marked all the parts as 'inline' (even though that is irrelevant in the text/html related case), an MUA that *knew* about this technique *could* display it "correctly", but an MUA that is just following the standards most likely won't.
OK, I see that now. It requires non-MIME information about the treatment of the root entity by the implementation. On the other hand, it shouldn't *hurt*. RFC 2387 explicitly specifies that at least some parts of a contained multipart/related part should be able to refer to entities related via the containing multipart/related. Since it does not mention *any* restrictions on contained root entities, I take it that it implicitly specifies that any contained multipart may make such references. But I suspect it's not implemented by most MUAs. I'll have to test.
OK, I see what you are driving at now. Whether or not it works is dependent on whether or not typical MUAs handle a multipart/related with a text/plain root part by treating it as if it were a multipart/mixed with inline or attachment sub-parts. So yes, whether or not we should support and/or document this technique very much depends on whether or not typical MUAs do so. I will, needless to say, be very interested in the results of your research :) --David

On Tue, 03 Sep 2013 10:01:42 -0400, "R. David Murray" <rdmurray@bitdance.com> wrote:
On Tue, 03 Sep 2013 10:56:36 +0900, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
R. David Murray writes:
I can understand the structure Glen found in Applemail: a series of text/plain parts interspersed with image/jpg, with all parts after the first being marked 'Contentent-Disposition: inline'. Any MUA that can display text and images *ought* to handle that correctly and produce the expected result. But that isn't what your structure above would produce. If you did:
multipart/related multipart/alternative text/html text/plain image/png text/plain image/png text/plain
and only referred to the png parts in the text/html part and marked all the parts as 'inline' (even though that is irrelevant in the text/html related case), an MUA that *knew* about this technique *could* display it "correctly", but an MUA that is just following the standards most likely won't.
OK, I see that now. It requires non-MIME information about the treatment of the root entity by the implementation. On the other hand, it shouldn't *hurt*. RFC 2387 explicitly specifies that at least some parts of a contained multipart/related part should be able to refer to entities related via the containing multipart/related. Since it does not mention *any* restrictions on contained root entities, I take it that it implicitly specifies that any contained multipart may make such references. But I suspect it's not implemented by most MUAs. I'll have to test.
OK, I see what you are driving at now. Whether or not it works is dependent on whether or not typical MUAs handle a multipart/related with a text/plain root part by treating it as if it were a multipart/mixed
I meant "a text/plain root part *inside* a multipart/alternative", which is what you said, I just didn't understand it at first :) Although I wonder how many GUI MUAs do the fallback to multipart/mixed with just a normal text/plain root part, too. I would expect a text-only MUA would, since it has no other way to display a multipart/related...but a graphical MUA might just assume that there will always be an html part in a multipart/related.
with inline or attachment sub-parts. So yes, whether or not we should support and/or document this technique very much depends on whether or not typical MUAs do so. I will, needless to say, be very interested in the results of your research :)
--David _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/rdmurray%40bitdance.com

R. David Murray writes:
I meant "a text/plain root part *inside* a multipart/alternative", which is what you said, I just didn't understand it at first :) Although I wonder how many GUI MUAs do the fallback to multipart/mixed with just a normal text/plain root part, too. I would expect a text-only MUA would, since it has no other way to display a multipart/related...but a graphical MUA might just assume that there will always be an html part in a multipart/related.
It's not really a problem with text vs. GUI, or an assumption of HMTL. There are plenty of formats that have such links, and some which don't have links, but rather assigned roles such as "Mac files" (with data fork and resource fork) and digital signatures (though that turned out to be worth designing a new multipart subtype). The problem is that "multipart/related" says "pass all the part entities to the handler appropriate to the root part entity, which will process the links found in the root part entity". If you implement that in the natural way, you just pass the text/plain part to the text/plain handler, which won't find any links for the simple reason that it has no protocol for representing them. This means that the kind of multipart/related handler I envision needs to implement linking itself, rather than delegate them to the root part handler. This requires checking the type of the root part: # not intended to look like Email API def handle_multipart_related (part_list, root_part): if root_part.content_type in ['text/plain']: # just display the parts in order handle_multipart_mixed (part_list) else: # cid -> entities in internal representation entity_map = extract_entity_map(part_list) root_part.content_type.handle(root_part, entity_map)

On 31/08/13 15:21, R. David Murray wrote:
If you've read my blog (eg: on planet python), you will be aware that I dedicated August to full time email package development. [...]
The API looks really nice! Thank you for putting this together. A question comes to mind though:
All input strings are unicode, and the library takes care of doing whatever encoding is required. When you pull data out of a parsed message, you get unicode, without having to worry about how to decode it yourself.
How well does your library cope with emails where the encoding is declared wrongly? Or no encoding declared at all? Conveniently, your email is an example of this. Although it contains non-ASCII characters, it is declared as us-ascii: --===============1633676851== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline which may explain why Stephen Turnbull's reply contains mojibake. -- Steven

On Sat, 31 Aug 2013 20:37:30 +1000, Steven D'Aprano <steve@pearwood.info> wrote:
On 31/08/13 15:21, R. David Murray wrote:
If you've read my blog (eg: on planet python), you will be aware that I dedicated August to full time email package development. [...]
The API looks really nice! Thank you for putting this together.
Thanks.
A question comes to mind though:
All input strings are unicode, and the library takes care of doing whatever encoding is required. When you pull data out of a parsed message, you get unicode, without having to worry about how to decode it yourself.
How well does your library cope with emails where the encoding is declared wrongly? Or no encoding declared at all?
It copes as best it can :) The bad bytes are preserved (unless you modify a part) but are returned as the "unknown character" in a string context. You can get the original bytes out by using the bytes access interface. (There are probably some places where how to do that isn't clear in the current API, but bascially either you use BytesGenerator or you drop down to a lower level API.) An attempt is made to interpret "bad bytes" as utf-8, before giving up and replacing them with the 'unknown character' character. I'm not 100% sure that is a good idea.
Conveniently, your email is an example of this. Although it contains non-ASCII characters, it is declared as us-ascii:
Oh, yeah, my MUA is a little quirky and I forgot the step that would have made that correct. Wanting to rewrite it is one of the reasons I embarked on this whole email thing a few years ago :) --David
participants (4)
-
Glenn Linderman
-
R. David Murray
-
Stephen J. Turnbull
-
Steven D'Aprano