[Email-SIG] API for Header objects [was: Dropping bytes "support" in json]

Tony Nelson tonynelson at georgeanelson.com
Thu Apr 16 20:08:57 CEST 2009


At 23:02 +1000 04/16/2009, Steven D'Aprano wrote:
>On Thu, 16 Apr 2009 10:39:52 am Tony Nelson wrote:
>
>> I don't want there to be any "str(msg['tag'])" or "bytes(msg['tag'])"
>> at all, so there would be no loss of consistency.
>
>That's ... different.
>
>
>> Messages need
>> flattening to bytes, but there is no use for converting individual
>> header fields into bytes or strings, outside of a message.
>
>Of course there is. You create each header individually, so you should
>be able to extract each header individually. Here, for example, is a
>use-case: I want to send postmaster a copy of the X-Spam-Evidence
>header so she can see why a particular piece of ham got wrongly flagged
>as spam, or visa versa:
>
>X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; '(which': 0.03;
>  'attribute': 0.04; 'objects': 0.04; 'returns': 0.05; 'split':
>  0.05; ...
>
>I need to be able to extract just that one header, and while some
>applications (mail client?) may choose to give me the entire message as
>text and expect me to manually hunt for the relevant line and
>copy-and-paste it, other applications may wish to automatically extract
>the appropriate header and email it to postmaster at localhost. Or write
>it to a log file, or whatever. Whatever they do, they probably need it
>as a string (of characters or bytes), not a binary blob.

This example seems tortured and contrived.  Custom code to extract a single
header one time to send to someone?  Just hit "reply" and trim it yourself.
If you must, you can use .get_header('X-Spam-Evidence').flatten().  I doubt
that anyone would actually do that, outside of a debugging session.

Any automatic process for sending reflected spam should include more of the
message, using the relevent MIME type message/partial (or message/rfc822).


>> Some
>> header field data /is/ strings, some is lists of address pairs, and
>> so on.
>
>But "lists of address pairs" themselves are strings.

Wrong!  They are *lists* (or at least sequences) of address pairs of
friendly name, email address.  Just as bytes are not strings, and dicts are
not strings, and JPEC images, lists are not strings.  For better
understanding of what an Address is, see RFC 5322 (the current incarnation
of RFC x822), section 3.4, which describes both the best way and current or
obsolete practice.


>> If the data for a header field is not properly a string,
>
>But it always is.

No.  This is important, and you will not understand RFC x822 email until
you understand this:  email messages are not character strings.  They are
byte sequences.  This confusion pervades the email package only because in
Python before 3.x, bytes were represented as strings.


>Even badly formatted emails with corrupt headers containing binary
>characters are strings -- they're just byte (non-Unicode) strings
>containing binary characters. Your mail server might not accept it as
>part of a valid header, but it's a valid byte string.

Strings are not bytes.  Sequences of bytes are not strings.  Converting
between them demands an encoding.  Sometimes the encoding exists, sometimes
it mostly exists, and sometimes there is no such encoding, as for a JPEG
image, which is a structured byte sequence.

>> a means to get it as one is wrong.
>
>Email *is* text. It's built on top of a restricted range of ASCII bytes,
>which we can legitimately call "text" because it is a subset of Unicode
>text. Even if a particular header contains binary data, it must be
>encoded as ASCII text before it can be placed into the header.
 ...

No, email is not text.  Email message bodies and some header fields may
represent text.  An email message is a byte sequence.  One really needs to
understand this in order to work with email at a low level.  When one does
not understand, then the email package should lead the user in the right
direction.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>


More information about the Email-SIG mailing list