unicode
7stud
bbxx789_05ss at yahoo.com
Sun Jul 1 01:26:20 EDT 2007
Based on this example and the error:
-----
u_str = u"abc\u9999"
print u_str
UnicodeEncodeError: 'ascii' codec can't encode character u'\u9999' in
position 3: ordinal not in range(128)
------
it looks like when I try to display the string, the ascii decoder
parses each character in the string and fails when it can't convert a
numerical code that is higher than 127 to a character, i.e. the
character \u9999.
In the following example, I use encode() to convert a unicode string
to a regular string:
-----
u_str = u"abc\u9999"
reg_str = u_str.encode("utf-8")
print repr(reg_str)
-----
and the output is:
'abc\xe9\xa6\x99'
1) Why aren't the characters 'a', 'b', and 'c' in hex notation? It
looks like python must be using the ascii decoder to parse the
characters in the string again--with the result being python converts
only the 1 byte numerical codes to characters. 2) Why didn't that
cause an error like above for the 3 byte character?
Then if I try this:
---
u_str = u"abc\u9999"
reg_str = u_str.encode("utf-8")
print reg_str
---
I get the output:
abc<some chinese character>
Here it looks like python isn't using the ascii decoder anymore. 2)
What determines which decoder python uses?
More information about the Python-list
mailing list