[Python-Dev] email package status in 3.X

Antoine Pitrou solipsis at pitrou.net
Sun Jun 20 19:55:47 CEST 2010


On Sun, 20 Jun 2010 14:26:28 +0200
Giampaolo Rodolà <g.rodola at gmail.com> wrote:
> I attempted to port pyftpdlib to python 3 several times and the
> biggest show stopper has always been the bytes / string difference
> introduced by Python 3 which forces you to *know* and *use* Unicode
> every time you deal with some text and 2to3 is completely useless
> here.

I don't really understand what the difficulties are. A character is a
character; to convert from bytes to characters needs to know the
encoding, which your protocol should specify somewhere (of course, I
suppose FTP is old and crummy enough that it may not specify anything).

An "encoding" is nothing more than a transformation. When you get
gzipped data, you must decompress it before doing anything useful out
of it. Similarly, when you get (say) UTF-8 data, you must decode it
before doing anything useful out of it.

> I can only imagine how difficult can it be to do such a conversion in
> a project like Twisted or Django where the I/O plays a fundamental
> role.

Twisted actually seems to enforce the bytes / unicode separation quite
well already, so I don't think they should have many problems on that
front. Modern Web frameworks seem to be in the same boat (they already
give the Web developer unicode strings to play with, and handle the
encoding/decoding at the IO boundary transparently).

> The choice of forcing the user to use Unicode and "think in Unicode"
> was a very brave one, and I'm sure it's for the better, but not
> everyone wants to deal with that because Unicode is hard to swallow.

Could Google fund a project named "Unicode Swallow"?

Regards

Antoine.




More information about the Python-Dev mailing list