[Email-SIG] API for Header objects

Stephen J. Turnbull stephen at xemacs.org
Fri Apr 17 12:04:39 CEST 2009


Tony Nelson writes:

 > This example seems tortured and contrived.

Not at all.  I currently use grep, not the email package, but in fact
I extract several headers for use in mailing list moderation.  It's
getting to the point where my gradually accreting shell script doesn't
cut it (more because I'm recruiting additional moderators than because
I'm not happy with it), and if I'm going to do this in Python I
definitely want an obvious and elegant way to produce a displayable
string (ie, Unicode) because not all of the messages I get in Chinese
and Korean are spam.

 > Custom code to extract a single header one time to send to someone?

That is precisely why we want a simple readable short elegant API.

Like str(msg['To']).

This also suggests the sequence interface of msg['To'] should not
contain tuples of strings, but rather NameAddr objects (taken from the
RFC 5322 grammar).  Then to flatten a NameAddr, use str or bytes as
appropriate.  So to present a list of addressees in a moderation
interface, you could use

    recips = list(msg['To']) + list(msg['Cc'])

    # We have a utf-8 codec on stdout, between us and the wire.
    print("<ul>\n")
    for recip in recips:
        print("  <li>")
        print(htmlesc(str(recip)))
        print("</li>\n")
    print("</ul>\n")

Of course for wire protocol, you just use "bytes" instead of "str".
Hey! that's not bad, even if I do say so myself.

 > Just hit "reply" and trim it yourself.

That won't work, for several reasons.

 > If you must, you can use .get_header('X-Spam-Evidence').flatten().
 > I doubt that anyone would actually do that, outside of a debugging
 > session.

<sigh />  I do it.

 > No.  This is important, and you will not understand RFC x822 email
 > until you understand this: email messages are not character
 > strings.  They are byte sequences.  This confusion pervades the
 > email package only because in Python before 3.x, bytes were
 > represented as strings.

That's a bit generous and ungenerous at the same time.  The people who
worked on email were trying to come up with a reasonable interface
that on the one side treated wire format as bytes (Python 1.x, 2.x
str) and display format as text (Python 1.x str, oops, Python 2.x
unicode).  They failed, unfortunately, but not really because the
tools were unavailable.  They just treated the difficulties with
insufficient respect.  On the other hand, these difficulties are
inherent in the medium.  People (by which I mean nobody participating
in this thread) think of email as text.  MTAs think of email as octet
sequences.  Developers (especially Americans) have been sloppy about
that distinction for *five* decades, and because until 2000 at least
email was the sine qua non of networking, backward compatibility has
long demanded incorporating all those mistakes in current practice.

And now you're doing the same thing.  Email messages have at *least*
four ways of manifesting in our world that email-sig needs to worry
about: as byte sequences on the wire, as (mostly, anyway, and
certainly the headers) texts in our MUAs, as whatever-they-really-are,
and as the internal representation of the email package.  So depending
on which side of the argument you feel like taking, you insist
(inconsistently) that "an email is a byte string" or "a header is not
a string at all, it's a structured thingie".  But it's not that easy.

What we need to do is come up with an API that respects all of those
aspects *simultaneously*, and allows us to elegantly but accurately
change the perspective we use to view this "whatever-it-really-is".

 > No, email is not text.  Email message bodies and some header fields
 > may represent text.  An email message is a byte sequence.  One
 > really needs to understand this in order to work with email at a
 > low level.

Hm.  And here I was hoping that the email package would *implement*
the low level, leaving me free to think about high-level things.

 > When one does not understand, then the email package should lead
 > the user in the right direction.

No, thank you.  Python is a double-opt-in language.  We're all
consenting adults here.  Programmers who don't understand the RFCs are
likely to be surprised in many places, but they asked for it, they got
it.



More information about the Email-SIG mailing list