char 128? no... 256

Tim Peters tim.one at comcast.net
Wed Feb 12 08:49:54 EST 2003


[Afanasiy]
> Yes I know I asked for a Unicode object in the example code.
> I am trying to emulate what Python is doing. And as you can
> see, it cannot be decoded.

That's right:  Unicode can be *en*coded into an 8-bit representation, but
not *de*coded.  8 -> Unicode is decoding, Unicode to 8 is encoding.
unicode() is a decoding function, and is working correctly.  If you want to
encode, then, for example,

>>> u'\u1234'.encode('utf-8')
'\xe1\x88\xb4'
>>>

>>> unicode('\xe1\x88\xb4', 'utf-8')
u'\u1234'
>>>

> So if I am getting a unicode object I have no way of converting it to
> ascii and am thus screwed.

There are dozens of ways to change it into an 8-bit string, depending on
which encoding scheme you choose to pass to the .encode() method.






More information about the Python-list mailing list