[Tutor] character encoding

Kent Johnson kent37 at tds.net
Wed Jul 9 03:32:17 CEST 2008


On Tue, Jul 8, 2008 at 5:19 PM, Robert Johansson
<robert.johansson at math.umu.se> wrote:
> Hi, I'm puzzled by the character encodings which I get when I use Python
> with IDLE. The string '\xf6' represents a letter in the Swedish alphabet
> when coded with utf8. On our computer with MacOSX this gets coded as
> '\xc3\xb6' which is a string of length 2. I have configured IDLE to encode
> utf8 but it doesn't make any difference.

I think you may be a bit confused about utf-8. '\xf6' is not a utf-8
character. U00F6 is the Unicode (not utf-8) codepoint for LATIN SMALL
LETTER O WITH DIAERESIS. '\xf6' is also the Latin-1 encoding of this
character. The utf-8 encoding of this character is the two-byte
sequence '\xc3\xb6'.

Can you give some more specific details about what you do and what you
see? Also you might want to do some background reading on Unicode;
this is a good place to start:
http://www.joelonsoftware.com/articles/Unicode.html

Kent


More information about the Tutor mailing list