[Python-Dev] Mailbox module - timings and functionality changes

R. David Murray rdmurray at bitdance.com
Wed Jun 30 01:56:30 CEST 2010


On Tue, 29 Jun 2010 13:54:09 -0400, Steve Holden <steve at holdenweb.com> wrote:
> A.M. Kuchling wrote:
> > But should mailboxes really be opened in a UTF-8 encoding, or should
> > they be treated as 7-bit text?  I'll have to think about this.
> 
> Neither! You can't open them as 7-bit text, because real-world email
> does contain bytes whose ordinal value exceeds 127. You can't open them
> using a text encoding because theoretically there might be ASCII headers
> that indicate that parts of the content are in specific character sets
> or encodings.
> 
> If only we had a data structure that easily allowed us to manipulate
> 8-bit characters ...

email6 *will* handle this use case.  When it exists :)  But note that it
is *not* just a matter of easily handling 8 bit characters.  There are
a whole bunch of algorithms needed for interpreting that 7 and 8 bit data.
All the info is there in the email headers, but being able to do string
operations on 8 bit byte strings doesn't get you the answers you need
by itself.

It really is the case that the Python3 bytes/unicode split forces us
to redo most of the algorithms so that they handle bytes and text
*correctly*.  This isn't a trivial undertaking, but the end result
will be well worth it.

--
R. David Murray                                      www.bitdance.com


More information about the Python-Dev mailing list