[Email-SIG] fixing the current email module

Sun Oct 11 00:58:38 CEST 2009

On approximately 10/10/2009 2:20 PM, came the following characters from 
the keyboard of R. David Murray:
> On Fri, 9 Oct 2009 at 11:59, Glenn Linderman wrote:
>> On approximately 10/9/2009 5:05 AM, came the following characters 
>> from the keyboard of Barry Warsaw:
>>>  On Oct 8, 2009, at 6:39 PM, Glenn Linderman wrote:
>>> >  1) wire format.  Either what came in, in the parser case, or what 
>>> would >  be generated.
>>> >  2) internal headers from the MIME part
>>> >  3) decoded BLOB.  This means that quopri and base64 are decoded, 
>>> no more >  and no less.  This is bytes.  No headers, only payload.  
>>> For >  Content-Transfer-Encoding: binary, this is mostly a noop.
>>> >  4) text/* parts should also be obtainable as str()/unicode(), 
>>> payload >  only.  This is where charset decoding is done.
>>> > >  I think your talk in the next paragraph about hooks and other 
>>> object >  types being produced is a generalization of 4, not 3, and 
>>> generally no >  additional decoding needs to be done, just 
>>> conversion to the right >  object type (or file, or file-like object).
>>>  I mostly agree with that.  I've always called #4 the "decoded 
>>> payload" and
>>>  #3 I've usually called the "raw payload".  Maybe we can bikeshed on 
>>> better
>>>  terms to help inform us about the API's method/attribute names.
>>
>> It would be good though to have standardized terms for easier 
>> communication. Maybe as they are chosen, they could be added to that 
>> Wiki RDM set up?
>
> I didn't set it up, Barry did.  I just started adding stuff ;)

OK.  I seem to have an account there, so made some edits.

>> My only problem with "raw" and "decoded" payload, is that there are 3 
>> payload formats, not 2, so there needs to be a 3rd term, 
>> corresponding to #1, #3, and #4, above.  #2 is somewhat orthogonal 
>> from the payload.
>>
>> To me, "raw" conjures up #1, not #3.
>
> I think I understand why Barry uses it for #3: it's the 'raw data' that
> went in to get transfer-encoded in the first place.  But clearly the
> term is ambiguous.

I found it so.

> I have set up two more documents on the wiki.  One is UseCases[1], and 
> I've
> tried to copy into it all of the use cases that have been mentioned in
> this discussion, plus a few more.  Edits welcome.

I hadn't seen UTF-16/-32/-BE/-LE mentioned in this discussion, but the 
MIME RFCs do mention use cases that require them, so I added it to 
RFC822 handling, but it might be better in HTTP handling?  Or maybe 
elsewhere?

> The other is a Glossary[2].  I think most of it accurately reflects the
> consensus here, but in it I'm proposing to use the term 
> 'transfer-decoded'
> for #3, and 'transfer-encoded' as an alternative to 'wire-format' just
> for symmetry.  Comments and suggestions welcome.

I like the distinction you made that 'wire format' is "in the wild", not 
known to be RFC compliant, and 'transfer-encoded' be the generated type, 
and compliant.  I would think that if we get data as far as 
'transfer-decoded', that we've (mostly) proven that the received 'wire 
format' is compliant, or can be made compliant. (I switched conformant 
to compliant, not finding the former at dictionary.com, and not liking 
conformable which I found there, as it seems to imply able to be changed 
to conform, in my head, although not in the definition).

> Any other terms of art we should record?
>
> --David
>
> [1] http://wiki.python.org/moin/Email%20SIG/UseCases
> [2] http://wiki.python.org/moin/Email%20SIG/Glossary
>

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking