[Email-SIG] API for Header objects [was: Dropping bytes "support" in json]

Stephen J. Turnbull stephen at xemacs.org
Fri Apr 17 12:09:42 CEST 2009


Tony Nelson writes:

 > No.  The useful data for an address field is *properly* a list of
 > pairs of friendly name, address -- you should read RFC 5322 section
 > 3.4.

The fact that you think I didn't suggests there's really no point in
continuing to talk to you.  But I'll give it another try.

The issues we are dealing with at this point really have very little
to do with accurate implementation of the RFCs.  We all know that's
necessary, but ... it's a Simple Matter Of Programming.  At least,
that's why Postel, Crocker, et al put so much effort into writing the
RFCs, so it would be a SMOP.  I think they did a pretty good job.

I agree with you that we should make it relatively difficult to put
things that *don't* conform to the RFCs on the wire.  But that should
be the responsibility of the middleware that talks to the file system
and to the MTA.  I see no reason *at this stage* to burden MUA (in the
general sense) developers with all the RFC rules, and MDA/MTA writers
"should" only need to worry about it for error handling (__bytes__()
should normally do the job for them).  (For values of "should"
equivalent to "in my dreams", I do fear.)

 > This makes it very important that the easy way of doing things be
 > the correct way.  With Address fields, that way is

Nonsense.  You are ignoring the fact that *people* (ie, nobody
participating in this thread<wink>) read an address field *as text*,
and they type in addresses *as text*.  We do not extract and inject
this information as pickles of Header objects via Firewire sockets
implanted in their skulls.  There is *no /unique/ correct way* here.

 > >For example (this is a trick question), in your opinion, what
 > >should
 > >
 > >    msg['To'][0]
 > >
 > >return if the original header was
 > >
 > >To: Stephen J. Turnbull <stephen at xemacs.org>
 > >
 > >?
 > 
 > ('Stephen J. Turnbull', 'stephen at xemacs.org')
 > 
 > You must be very confused to think this is a trick question.
 > Try it with the current email package's email.utils.parseaddr().
 > Again, see RFC5322 section 3.4.

But section 3.4 is not relevant to the trickiness, and parseaddr is
not strictly conforming.  See the definitions of name-addr,
display-name, phrase, word, atom, and atext in sections 3.2.3, 3.2.5,
and 3.4 of the RFC you cite.  Also see the definition of special.
Finally, I commend to your attention the definition of obs-phrase in
section 4.1, and the *very* special nature of this particular gotcha
as described there.

The point is that by parsing that and claiming it's an RFC 5322
section 3.4 name-addr, you have invoked the rather magical Postel
Principle.  You either have to say "for my purpose I want magic in the
API" (which you previously denied), or you have to admit that this is
harder than it looks.

It is true that section 4.1 says that the obsolete ("interpreting")
syntax must be accepted *off the wire*.  So there certainly is a
justification for having a short obvious elegant spelling for "make an
address Header into a sequence".  But IMHO that spelling should be
"list(msg['To'])", not "msg['To']".

The rationale is that---assuming it can be implemented---several of us
would like to be able to spell "wire format" as "bytes(msg['To'])" and
"display format" as "str(msg['To'])".  I bet there are other uses that
would be well-served by such indirection.  And I would be disappointed
if we can't do way better than "msg.get_header('To').flatten()" to get
bytes---or should that be string?---out.

 > > > Internally, the Header whose .useful attribute is returned by
 > > > "msg['foo']" will contain parsed data, referring to parsed tokens.
 > > > Flattening those parsed tokens will produce the original data.  Not
 > > > a problem at all, simple to implement, in the most direct way.
 > >
 > >And horrid to use, if you mean that the internal representation will
 > >be a full parse tree according to the augmented BNF in RFCs 822, 2822,
 > >5322, 2045-2049, etc etc., and that the only other way to access that
 > >data is via an arbitrarily defined .useful attribute (which, BTW, is
 > >quite unpythonic if you intend for it to be available as msg['foo'] as
 > >well: TOOWTDI).
 > 
 > You put words in my mouth.

Of course I don't put words in your mouth.  The phrase "if you mean
that" clearly indicates that what follows is *my* understanding of the
implications of what you wrote.  I think that interpretation is quite
justifiable based on your insistence that the OOWTDI be your "sequence
of (address, display-name) pairs."

 > Wny assume that I am incompetent, or a fool?

I don't assume any such thing.  But I become less and less trustful of
your goodwill toward requirements other than your own.

 > Of course the internal representation would include the full parse tree.
 > Of course the external interface would provide read and write access to the
 > relevent data.

Note that I didn't say it wouldn't.  I said it *would*.  But I think
it's justified, by what you have written so far, to expect that it
would be an inconvenient interface (maybe even "horridly" so).

 > The .useful attribute (need a better name)

I like __getitem__(), __str_(), and __bytes__(), for starters.  I
think we do *need* multiple names, because different presentations are
"useful" in different contexts.

 > is the way to read the useful part of the data extracted from the
 > parse tree, whatever type of data that is, which depends on the
 > header field type, determined by its name.  Each Header subclass

Please remember that Barry says he doesn't like subclassing to deal
with issues of variation in header semantics, based on his experience
with it in past versions of the email package.  I'm not sure how he
plans to avoid it (I suspect he'll be forced to give it up because
what he comes up with will be horrid<wink>), but at this stage we
really shouldn't assume that we can freely subclass Header.

 > would have its own other attributes.  The .useful attribute guides
 > users and is used by .__getitem__() to return that data.

As I said before, I agree with RDM (not to mention pretty much
everybody but you that has posted on this topic) that there should be
one more level of indirection here.  Ie, __getitem__ should return a
Header object.



More information about the Email-SIG mailing list