[Tutor] Code critique and unicode questions
Kent Johnson
kent_johnson at skillsoft.com
Fri Aug 27 05:24:52 CEST 2004
The input strings need to be converted to unicode strings using the
decode() method. Here is an annotated example. It doesn't come out right in
the email so I will have to describe it to you
>>> s=raw_input()
(I type ALT-0229 on the numeric keypad. The console shows /<greek letter sigma>
>>> s
'/\xe5'
>>> print s
(The console shows /<greek letter sigma>)
>>> u=s.decode('cp1252')
>>> u
u'/\xe5'
>>> print u
(Now it's a unicode string so it shows <a with ring above>)
/å
I don't fully understand this. Here is what I think is going on:
- the result of raw_input() is a plain string. When it is printed it is
interpreted as DOS codepage 437, where codepoint 229 is <greek letter sigma>.
- string.decode() says, interpret this as a string in cp1252 and convert it
to Unicode. Now, even though the character value is the same, the system is
interpreting it as <a with ring above>
- printing the unicode string gives the correct result
There is a way to set a default encoding but it doesn't seem to affect this
problem any...what you do is create a file called sitecustomize.py in
Python\Lib\site-packages. In this file put the two lines
import sys
sys.setdefaultencoding('cp1252')
I'm not sure what this is supposed to do but you could play around with it
and see if anything changes :-)
You have to call setdefaultencoding() in sitecustomize.py; if you call it
in your program you will get an error.
Thanks Jeff for jogging my memory!
HTH
Kent
At 02:02 PM 8/24/2004 +0200, Ole Jensen wrote:
>2) If the user enters a special character (like above), the chacteres in
>the outputted filename gets even more messed up (it looks like something
>from the greek alphabet!).
More information about the Tutor
mailing list