Need debugging knowhow for my creeping Unicodephobia
mk
mrkafk at gmail.com
Thu Feb 11 11:45:05 EST 2010
kj wrote:
> I have read a *ton* of stuff on Unicode. It doesn't even seem all
> that hard. Or so I think. Then I start writing code, and WHAM:
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
>
> (There, see? My Unicodephobia just went up a notch.)
>
> Here's the thing: I don't even know how to *begin* debugging errors
> like this. This is where I could use some help.
>>> a=u'\u0104'
>>>
>>> type(a)
<type 'unicode'>
>>>
>>> nu=a.encode('utf-8')
>>>
>>> type(nu)
<type 'str'>
See what I mean? You encode INTO string, and decode OUT OF string.
To make matters more complicated, str.encode() internally DECODES from
string into unicode:
>>> nu
'\xc4\x84'
>>>
>>> type(nu)
<type 'str'>
>>> nu.encode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)
There's logic to this, although it makes my brain want to explode. :-)
Regards,
mk
More information about the Python-list
mailing list