[Tutor] Code critique and unicode questions

Kent Johnson kent_johnson at skillsoft.com
Fri Aug 27 05:24:52 CEST 2004


The input strings need to be converted to unicode strings using the 
decode() method. Here is an annotated example. It doesn't come out right in 
the email so I will have to describe it to you

 >>> s=raw_input()
(I type ALT-0229 on the numeric keypad. The console shows /<greek letter sigma>
 >>> s
'/\xe5'
 >>> print s
(The console shows /<greek letter sigma>)
 >>> u=s.decode('cp1252')
 >>> u
u'/\xe5'
 >>> print u
(Now it's a unicode string so it shows <a with ring above>)
/å

I don't fully understand this. Here is what I think is going on:
-  the result of raw_input() is a plain string. When it is printed it is 
interpreted as DOS codepage 437, where codepoint 229 is <greek letter sigma>.
- string.decode() says, interpret this as a string in cp1252 and convert it 
to Unicode. Now, even though the character value is the same, the system is 
interpreting it as <a with ring above>
- printing the unicode string gives the correct result

There is a way to set a default encoding but it doesn't seem to affect this 
problem any...what you do is create a file called sitecustomize.py in 
Python\Lib\site-packages. In this file put the two lines
import sys
sys.setdefaultencoding('cp1252')

I'm not sure what this is supposed to do but you could play around with it 
and see if anything changes :-)
You have to call setdefaultencoding() in sitecustomize.py; if you call it 
in your program you will get an error.

Thanks Jeff for jogging my memory!
HTH
Kent

At 02:02 PM 8/24/2004 +0200, Ole Jensen wrote:
>2) If the user enters a special character (like above), the chacteres in
>the outputted filename gets even more messed up (it looks like something
>from the greek alphabet!).





More information about the Tutor mailing list