convert \uXXXX to native character set?

Bengt Richter bokr at
Tue Dec 21 23:59:11 CET 2004

On Mon, 20 Dec 2004 12:49:39 +0200, Miki Tebeka <miki.tebeka at> wrote:

>Hello Joe,
>>     Is there any library to convert HTML page with \uXXXX encoded text to
>>    native character set, e.g. BIG5.
>Try: help("".decode)
But the OP wants to en-code, I think. E.g. (I don't know what Chinese for ichi is ;-)

 >>> ichi = u'\u4e00'
 >>> ichi
 >>> ichi.encode('big5')

UIAM that created two str bytes constituting big5 code for
the single horizontal stroke glyph whose unicode code is u'\u4e00'

 >>> list(ichi.encode('big5'))
 ['\xa4', '@']

going from big5-encoded str back to unicode then takes de-coding:

 >>> '\xa4@'.decode('big5')

Bengt Richter

More information about the Python-list mailing list