Using Unicode scripts
yzzzzz at netcourrier.com
Fri Jul 18 14:40:46 CEST 2003
OK, problem solved!
I got the new Python, it all works. I just had to add the UTF-8 BOM myself
(UltraEdit doesn't do it by default) but that wasn't too difficult to do
(copy and paste a ZWNBSP).
One last question: I'm using windows, so the console's encoding is CP437. If
I try to print a unicode string, the string is converted to CP437 and
printed and that works fine. However if I try to print a normal
(non-unicode) string from a UTF-8 encoded file with BOM, for example print
"é", it sends out the two UTF-8 bytes Ã© which appear as lines in the CP437
charset. But if I print the exact same character in a Latin 1 encoded file,
it comes out as the Latin 1 byte for "é" which shows up as a theta in CP437.
This means that Python doesn't take into account the specified encoding
(Latin 1 or UTF-8) and prints out the raw bytes as they appear in the source
file, regardless of the encoding used. Is this normal? (this isn't really a
problem for me as I am only going to use unicode strings now)
More information about the Python-list