[Python-Dev] Mailbox module - timings and functionality changes
R. David Murray
rdmurray at bitdance.com
Wed Jun 30 01:56:30 CEST 2010
On Tue, 29 Jun 2010 13:54:09 -0400, Steve Holden <steve at holdenweb.com> wrote:
> A.M. Kuchling wrote:
> > But should mailboxes really be opened in a UTF-8 encoding, or should
> > they be treated as 7-bit text? I'll have to think about this.
>
> Neither! You can't open them as 7-bit text, because real-world email
> does contain bytes whose ordinal value exceeds 127. You can't open them
> using a text encoding because theoretically there might be ASCII headers
> that indicate that parts of the content are in specific character sets
> or encodings.
>
> If only we had a data structure that easily allowed us to manipulate
> 8-bit characters ...
email6 *will* handle this use case. When it exists :) But note that it
is *not* just a matter of easily handling 8 bit characters. There are
a whole bunch of algorithms needed for interpreting that 7 and 8 bit data.
All the info is there in the email headers, but being able to do string
operations on 8 bit byte strings doesn't get you the answers you need
by itself.
It really is the case that the Python3 bytes/unicode split forces us
to redo most of the algorithms so that they handle bytes and text
*correctly*. This isn't a trivial undertaking, but the end result
will be well worth it.
--
R. David Murray www.bitdance.com
More information about the Python-Dev
mailing list