Unicode characters in btye-strings
steve at REMOVE-THIS-cybersource.com.au
Fri Mar 12 13:35:57 CET 2010
I know this is wrong, but I'm not sure just how wrong it is, or why.
Using Python 2.x:
>>> s = "éâÄ"
>>> print s
['\xc3', '\xa9', '\xc3', '\xa2', '\xc3', '\x84']
Can somebody explain what happens when I put non-ASCII characters into a
non-unicode string? My guess is that the result will depend on the
current encoding of my terminal.
In this case, my terminal is set to UTF-8. If I change it to ISO 8859-1,
and repeat the above, I get this:
['\xe9', '\xe2', '\xc4']
If I do this:
>>> s = u"éâÄ"
which at least explains why the bytes have the values which they do.
More information about the Python-list