[Email-SIG] API for Header objects

Tony Nelson tonynelson at georgeanelson.com
Fri Apr 17 19:37:43 CEST 2009


At 19:04 +0900 04/17/2009, Stephen J. Turnbull wrote:
>Tony Nelson writes:
>
> > This example seems tortured and contrived.
>
>Not at all.  I currently use grep, not the email package, but in fact
>I extract several headers for use in mailing list moderation.  It's
>getting to the point where my gradually accreting shell script doesn't
>cut it (more because I'm recruiting additional moderators than because
>I'm not happy with it), and if I'm going to do this in Python I
>definitely want an obvious and elegant way to produce a displayable
>string (ie, Unicode) because not all of the messages I get in Chinese
>and Korean are spam.

Now /that/ is a use case.  Spam headers are a poor one in any case, as
there are so many different ones.


> > Custom code to extract a single header one time to send to someone?
>
>That is precisely why we want a simple readable short elegant API.
>
>Like str(msg['To']).

Would that return the display-name (friendly name) for the listed mailboxes
in one string, presumbably separated by commas?  How would you get the
addr-specs?  How would you get both?  Use bytes() to flatten all the data,
or just the addr-specs?


>This also suggests the sequence interface of msg['To'] should not
>contain tuples of strings, but rather NameAddr objects (taken from the
>RFC 5322 grammar).  Then to flatten a NameAddr, use str or bytes as
>appropriate.  So to present a list of addressees in a moderation
>interface, you could use

I was a bit sloppy.  The tuples would be character string, byte string:  in
2.x, unicode and string; in 3.x, string and bytes.  Flattening to bytes
(2.x: string) for export would be ._flatten().

In practice, the display-names and addr-specs may have had "defects" when
parsing the message.  Addr-specs are supposed to be ASCII, but the
local-part sometimes isn't.  Display-names don't always RFC 2047 decode
properly, or may have non-ASCII characters in them.

>    recips = list(msg['To']) + list(msg['Cc'])
>
>    # We have a utf-8 codec on stdout, between us and the wire.
>    print("<ul>\n")
>    for recip in recips:
>        print("  <li>")
>        print(htmlesc(str(recip)))
>        print("</li>\n")
>    print("</ul>\n")
>
>Of course for wire protocol, you just use "bytes" instead of "str".
>Hey! that's not bad, even if I do say so myself.

You wouldn't like

    for name, addr in msg['To'] + msg['Cc'] + msg['Bcc']:

instead?  str(addr) should work (IIUC Py3K) if addr is ASCII, as it should be.


>...People (by which I mean nobody participating
>in this thread) think of email as text. ...

No, they don't.  You have to ask them the right questions.  Sure they'll
say text, but they really expect styled fancy structured colored text with
pictures and links and attached documents.  Roughly, they think of email as
web pages (archived HTML, if they knew the word).  Only the most
sophisticated or old and stubborn think of it otherwise.

>MTAs think of email as octet sequences.

Well, disagree with the MTA, and the MTA wins.  Messages on the wire /are/
bytes.

>Developers (especially Americans) have been sloppy about
>that distinction for *five* decades, and because until 2000 at least
>email was the sine qua non of networking, backward compatibility has
>long demanded incorporating all those mistakes in current practice.
>
>And now you're doing the same thing.  Email messages have at *least*
>four ways of manifesting in our world that email-sig needs to worry
>about: as byte sequences on the wire, as (mostly, anyway, and
>certainly the headers) texts in our MUAs, as whatever-they-really-are,
>and as the internal representation of the email package.  So depending
>on which side of the argument you feel like taking, you insist
>(inconsistently) that "an email is a byte string" or "a header is not
>a string at all, it's a structured thingie".  But it's not that easy.
>
>What we need to do is come up with an API that respects all of those
>aspects *simultaneously*, and allows us to elegantly but accurately
>change the perspective we use to view this "whatever-it-really-is".

That's why my proposal is so good, as it does this.


> > No, email is not text.  Email message bodies and some header fields
> > may represent text.  An email message is a byte sequence.  One
> > really needs to understand this in order to work with email at a
> > low level.
>
>Hm.  And here I was hoping that the email package would *implement*
>the low level, leaving me free to think about high-level things.

You have that now, and it is terribly hard to use.


> > When one does not understand, then the email package should lead
> > the user in the right direction.
>
>No, thank you.  Python is a double-opt-in language.  We're all
>consenting adults here.  Programmers who don't understand the RFCs are
>likely to be surprised in many places, but they asked for it, they got
>it.

Battery materials included!  Build your own batteries if you can learn how!
Some have done it in as little as two years.

There are other languages competing with Python, and users can choose to
use them instead.  Python's email package needs to stop requiring years of
study to use correctly.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>


More information about the Email-SIG mailing list