Hallvard B Furuseth
h.b.furuseth at usit.uio.no
Fri Oct 8 15:31:27 CEST 2010
Arnaud Delobelle writes:
>Hallvard B Furuseth <h.b.furuseth at usit.uio.no> writes:
>> I've been playing a bit with Python3.2a2, and frankly its charset
>> handling looks _less_ safe than in Python 2.
>> With 2.<late> conversion Unicode <-> string the equivalent operation did
>> not silently produce garbage: it raised UnicodeError instead. With old
>> raw Python strings that was not a problem in applications which did not
>> need to convert any charsets, with python3 they can break.
>> I really wish bytes.__str__ would at least by default fail.
> I think you misunderstand the purpose of str(). It is to provide a
> (unicode) string representation of an object and has nothing to do with
> converting it to unicode:
That's not the point - the point is that for 2.* code which _uses_ str
vs unicode, the equivalent 3.* code uses str vs bytes. Yet not the
same way - a 2.* 'str' will sometimes be 3.* bytes, sometime str. So
upgraded old code will have to expect both str and bytes.
In 2.*, str<->unicode conversion failed or produced the equivalent
character/byte data. Yes, there could be charset problems if the
defaults were set up wrong, but that's a smaller problem than in 3.*.
In 3.*, the bytes->str conversion always _silently_ produces garbage.
And lots of code use both, and need to convert back and forth. In
particular code 3.* code converted from 2.*, or using modules converted
from 2.*. There's a lot of such code, and will be for a long time.
More information about the Python-list