[Python-Dev] email package status in 3.X

Mon Jun 21 00:01:08 CEST 2010

On Sun, Jun 20, 2010 at 11:30 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 6/20/2010 8:26 AM, Giampaolo Rodolà wrote:
>
>> I attempted to port pyftpdlib to python 3 several times and the
>> biggest show stopper has always been the bytes / string difference
>> introduced by Python 3 which forces you to *know* and *use* Unicode
>> every time you deal with some text and 2to3 is completely useless
>> here.
>
> I believe the advice in the wiki porting page is to use unicode() and
> bytes() but never str(), in a version that runs in 2.6. Then 2to3 should do
> fine. For 2.5-, add 'bytes = str' somewhere.

Really? I thought you were supposed to call encode/decode methods on
the appropriate thing, depending if they're coming from a byte source
or a character source. The problems arise when you're doing things
like paths, which I believe are bytes on *nix and proper Unicode on
Windows (which basically just means they enforce an encoding, UTF-16
if I'm not mistaken). I don't actually use Windows so I might be
completely wrong here.

> 2to3 still gets patches, I believe, when someone exhibits code that could
> and ought to be converted but is not.
>
> I suspect that if you posted 'Problems porting pyftpdlib to Python3', you
> would get some help. If it involved inadequacies in the current tools and
> guides, it would to be be on-topic here. Or try python-list.
>
>> The choice of forcing the user to use Unicode and "think in Unicode"
>> was a very brave one, and I'm sure it's for the better, but not
>> everyone wants to deal with that because Unicode is hard to swallow.
>
> I felt that way until my daughter decided to switch from Spanish to Japanese
> for here foreign language. Once I quit fighting it, it because much easier
> to swallow and learn. As it turns out, thinking in Unicode is a pretty
> straightforward generalization of thinking in ascii. There are some annoying
> glitches due to the need to accomodate legacy systems. The plethora of
> legacy encodings for various subsets, besides ascii, is also a nuisance.

I think doing unicode/str properly in 2.x is very important, #python
stresses it quite often, I think Py3k's strictness is a good idea
because people very often write something that appears to work for a
long time, and then someone tries it using funny bytes, and everything
blows apart. Convincing people their software is wrong when
"everything worked five minutes ago" is really hard :-)

You'd be surprised how long it can take before some of these problems
are found, a couple of weeks ago in #python we had exactly this
problem when we were helping Blender folks. There was a bug report
from a German Blender user, turns out Blender ignores unicode in some
critical spot making importing between people who disagree on charsets
impossible. And Blender isn't exactly a project that's two weeks old
and filled with idiots :) The downside is that *fixing* them then
becomes a nontrivial task.

The central problem is probably that a lot of people don't understand
Unicode. Recently I learned that even Tanenbaum got it wrong in his
latest revision of the computer networks book! (Although that might
just be my dutch translation of it being bad).

> Terry Jan Reedy

Laurens