Character Encodings and display of strings
JKPeck at gmail.com
Mon Nov 13 15:23:23 CET 2006
I am trying to understand why, with nonwestern strings, I sometimes get
a hex display and sometimes get the string printed as characters.
With my Python locale set to Japanese and with or without a # coding of
cp932 (this is Windows) at the top of the file, I read a list of
Japanese strings into a list, say, catlis.
With this code
for item in catlis:
print " ".join(catlis)
the first print (print item) displays Japanese text as characters..
The second print (print catlis) displays a list with the double byte
characters in hex notation.
The third print (print " ".join(catlis)) prints a combined string of
Japanese characters properly.
According to the print documentation,
"If an object is not a string, it is first converted to a string using
the rules for string conversions"
but the result is different with a list of strings.
The hex display looks like this:
['id', '\x90\xab\x95\xca', '\x90\xb6\x94N\x8c\x8e\x93\xfa',
'\x8fA\x8aw\x94N\x90\x94', '\x90E\x8e\xed', '\x8b\x8b\x97^',
and correctly shows the hex values of the Japanese characters.
Why are these different?
More information about the Python-list