[Tutor] unicode and character sets

Kent Johnson kent37 at tds.net
Thu Aug 16 14:11:34 CEST 2007

tpc247 at gmail.com wrote:
> http://www.joelonsoftware.com/articles/Unicode.html
> I realize the following: It does not make sense to have a string without 
> knowing what encoding it uses.  There is no such thing as plain text.

Good start!
> Ok.  Fine.  In Mozilla, by clicking on View, Character Encoding, I find 
> out that the text in the file I grab from:
> http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/index.html
> is encoded in ISO-8859-1.  So I go about changing Python's default 
> encoding according to:
> http://www.diveintopython.org/xml_processing/unicode.html

I don't think this is necessary. Did it actually fix anything? Changing 
the default encoding is not recommended because it makes your scripts 

> BUT the LATIN CAPITAL LETTER A WITH RING ABOVE character still displays 
> in IDLE as \xc5 !  I can get the character to display correctly if I type:
> print "\xc5"

In many cases IDLE will display the repr() of a string which shows any 
non-ascii character as a hexidecimal escape. It is actually the correct 
character. print does not use the repr() so it displays correctly.

> which is fine if I am simply going to copy and paste the select element 
> into my html file.  However, I want to be able to dynamically generate 
> the html form page and have the character in question display correctly 
> in the web browser.
> The problem, of course, is that if I run my script that creates the 
> select element in IDLE I continue to see the output:
> <option value='AX'>\xc5land Islands</option>
> Am I doing something wrong ?

No, actually you are doing great. This is correct output, it is just not 
displaying in the form you expect. The data is correct.


More information about the Tutor mailing list