[Email-SIG] API for Header objects [was: Dropping bytes "support" in json]

Tony Nelson tonynelson at georgeanelson.com
Thu Apr 16 20:08:59 CEST 2009


At 15:24 +0900 04/16/2009, Stephen J. Turnbull wrote:
>Tony Nelson writes:
>
> > strings, some is lists of address pairs, and so on.  If the data
> > for a header field is not properly a string, a means to get it as
> > one is wrong.
>
>Er, but the data for an address field is not "properly" a list of
>pairs, either.  So I guess you would agree that a means to get it as
>one is wrong, then?

No.  The useful data for an address field is *properly* a list of pairs of
friendly name, address -- you should read RFC 5322 section 3.4.  You need
to understand this about email in order to continue this discussion, though
your confusion does bring up the important point that people have poor
understanding of email, and need guidance in how to use and compose it.
This makes it very important that the easy way of doing things be the
correct way.  With Address fields, that way is a sequence of pairs of
friendly name and address.  Though the address could be parsed further,
there is seldom any need to do so (outside of the Header parser itself).


> > All the grotty internals of Heaer objects would be accessible by
> > fetching the Header object with "msg.get_header('name')".
> > "msg[...]" is an abbreviation for convenience which should not
> > mislead users or be complex or magical in action.
>
>A message or so back you made the point that an address header is a
>rather complex object that is *not* easy to parse.

Which is exactly why the email package already has an address parser,
though it also needs a more general parser for the other header field types.

>For example (this
>is a trick question), in your opinion, what should
>
>    msg['To'][0]
>
>return if the original header was
>
>To: Stephen J. Turnbull <stephen at xemacs.org>
>
>?

('Stephen J. Turnbull', 'stephen at xemacs.org')

You must be very confused to think this is a trick question.  Try it with
the current email package's email.utils.parseaddr().  Again, see RFC5322
section 3.4.


> > Internally, the Header whose .useful attribute is returned by
> > "msg['foo']" will contain parsed data, referring to parsed tokens.
> > Flattening those parsed tokens will produce the original data.  Not
> > a problem at all, simple to implement, in the most direct way.
>
>And horrid to use, if you mean that the internal representation will
>be a full parse tree according to the augmented BNF in RFCs 822, 2822,
>5322, 2045-2049, etc etc., and that the only other way to access that
>data is via an arbitrarily defined .useful attribute (which, BTW, is
>quite unpythonic if you intend for it to be available as msg['foo'] as
>well: TOOWTDI).

You put words in my mouth.  Wny assume that I am incompetent, or a fool?
Of course the internal representation would include the full parse tree.
Of course the external interface would provide read and write access to the
relevent data.  The .useful attribute (need a better name) is the way to
read the useful part of the data extracted from the parse tree, whatever
type of data that is, which depends on the header field type, determined by
its name.  Each Header subclass would have its own other attributes.  The
.useful attribute guides users and is used by .__getitem__() to return that
data.
-- 
____________________________________________________________________
TonyN.:'                       <mailto:tonynelson at georgeanelson.com>
      '                              <http://www.georgeanelson.com/>


More information about the Email-SIG mailing list