On Mon, 13 Apr 2009 at 10:28, Barry Warsaw wrote:
On Apr 11, 2009, at 8:39 AM, Chris Withers wrote:
Barry Warsaw wrote:
message['Subject'] The raw bytes or the decoded unicode?
A header object.
Yep. You got there before I did. :)
Okay, so you've picked one. Now how do you spell the other way?
Yes for unstructured headers like Subject. For structured headers... hmm.
Some "reasonable" printable interpretation that has no semantic meaning?
Now, setting headers. Sometimes you have some unicode thing and sometimes you have some bytes. You need to end up with bytes in the ASCII range and you'd like to leave the header value unencoded if so. But in both cases, you might have bytes or characters outside that range, so you need an explicit encoding, defaulting to utf-8 probably.
Message.set_header('Subject', 'Some text', encoding='utf-8') Message.set_header('Subject', b'Some bytes')
Where you just want "a damned valid email and stop making my life hard!":
Yes. In which case I propose we guess the encoding as 1) ascii, 2) utf-8, 3) wtf?
Given some usenet postings I've just dealt with, (3) appears to sometimes be spelled 'x-unknown' and sometimes (in the most recent case) 'unknown-8bit'. A quick google turns up a hit on RFC1428 for the latter, and a bunch of trouble tickets for the former...so I think 'wtf' is correctly spelled 'unknown-8bit'.
However, it's not supposed to be used by mail composers, who are expected to know the encoding. It's for mail gateways that are transforming something and don't know the encoding. I'm not sure what this means for the email module, which certainly will be used in a mail gateways....maybe it's the responsibility of the application code to explicitly say 'unknown encoding'?
Where you care about what encoding is used:
If you have bytes, for whatever reason:
...because only you know what encoding those bytes use!
So you're saying that __setitem__() should not accept raw bytes?
If I'm understanding things correctly, if it did accept bytes the person using that interface would need to do whatever encoding (eg: encoded-word) was needed, so the interface should check that the byte string is 8 bit clean. But having some sort of 'setraw' method on Header might be better for that case.