steve at REMOVE-THIS-cybersource.com.au
Fri Oct 8 19:50:10 CEST 2010
On Fri, 08 Oct 2010 15:31:27 +0200, Hallvard B Furuseth wrote:
> Arnaud Delobelle writes:
>>Hallvard B Furuseth <h.b.furuseth at usit.uio.no> writes:
>>> I've been playing a bit with Python3.2a2, and frankly its charset
>>> handling looks _less_ safe than in Python 2. (...)
>>> With 2.<late> conversion Unicode <-> string the equivalent operation
>>> did not silently produce garbage: it raised UnicodeError instead.
>>> With old raw Python strings that was not a problem in applications
>>> which did not need to convert any charsets, with python3 they can
>>> I really wish bytes.__str__ would at least by default fail.
>> I think you misunderstand the purpose of str(). It is to provide a
>> (unicode) string representation of an object and has nothing to do with
>> converting it to unicode:
> That's not the point - the point is that for 2.* code which _uses_ str
> vs unicode, the equivalent 3.* code uses str vs bytes. Yet not the same
> way - a 2.* 'str' will sometimes be 3.* bytes, sometime str. So
> upgraded old code will have to expect both str and bytes.
I'm sorry, this makes no sense to me. I've read it repeatedly, and I
still don't understand what you're trying to say.
> In 2.*, str<->unicode conversion failed or produced the equivalent
> character/byte data. Yes, there could be charset problems if the
> defaults were set up wrong, but that's a smaller problem than in 3.*. In
> 3.*, the bytes->str conversion always _silently_ produces garbage.
So you say, but I don't see it. Why is this garbage?
>>> b = b'abc\xff'
That's what I would expect from the str() function called with a bytes
argument. Since decoding bytes requires a codec, which you haven't given,
it can only return a string representation of the bytes.
If you want to decode bytes into a string, you need to specify a codec:
>>> >>> str(b, 'latin-1')
More information about the Python-list