Need debugging knowhow for my creeping Unicodephobia

Thu Feb 11 11:45:05 EST 2010

kj wrote:
> I have read a *ton* of stuff on Unicode.  It doesn't even seem all
> that hard.  Or so I think.  Then I start writing code, and WHAM:
> 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
> 
> (There, see?  My Unicodephobia just went up a notch.)
> 
> Here's the thing: I don't even know how to *begin* debugging errors
> like this.  This is where I could use some help.

 >>> a=u'\u0104'
 >>>
 >>> type(a)
<type 'unicode'>
 >>>
 >>> nu=a.encode('utf-8')
 >>>
 >>> type(nu)
<type 'str'>

See what I mean? You encode INTO string, and decode OUT OF string.

To make matters more complicated, str.encode() internally DECODES from 
string into unicode:

 >>> nu
'\xc4\x84'
 >>>
 >>> type(nu)
<type 'str'>
 >>> nu.encode()
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: 
ordinal not in range(128)

There's logic to this, although it makes my brain want to explode. :-)

Regards,
mk