[Tutor] Declaring encoding question

Mon Aug 22 18:12:00 CEST 2005

mailing list wrote:
> Hi all, 
> 
> I've got some source code which will be handling non-ASCII chars like
> umlauts and what not, and I've got a testing portion stored in the
> code.
> 
> I get this deprecation warning when I run my code - 
> __main__:1: DeprecationWarning: Non-ASCII character '\xfc' in file
> C:\Python24\testit.py on line 733, but no encoding declared; see
> http://www.python.org/peps/pep-0263.html for details
> 
> I'm reading this - http://www.python.org/peps/pep-0263.html
> 
> Now, the non-ASCII character is in the test data, so it's not actually
> part of my code.
> Will Python be able to handle \xfc and company in data without my
> telling it to use a different form of encoding?

You should tell Python what the encoding is. The non-ASCII character is part of the source file. Just include the line
# -*- coding: cp1252 -*-
at the start of the code.

> When I run the code, and get my returned data, it looks like this in
> Pythonwin -
> 
> 
>>>>print j["landunits"].keys()
> 
> ['"J\xe4ger"', '"Deutschmeister"', '"Army of Bohemia"',
> '"Gardegrenadiere"', '"K.u.K Armee"', '"Erzherzog"', '"Army of
> Italy"', '"Army of Silesia"', '"Army of Hungary"']
> 
> So J\xe4ger is actually Jäger. When I run it slightly differently - 
> 
>>>>for item in j["landunits"].keys():
> 
> ... 	print item
> ... 	
> "Jäger"
> "Deutschmeister"
> "Army of Bohemia"
> "Gardegrenadiere"
> "K.u.K Armee"
> "Erzherzog"
> "Army of Italy"
> "Army of Silesia"
> "Army of Hungary"
> 
> It prints the umlauted 'a' fine and dandy. 

You are seeing the difference between printing a string and printing it's repr().
When you print a list (which is what j["landunits"].keys() is), Python prints the repr() of each element of the list. repr() of a string shows non-ascii characters as \x escapes; that's why you get J\xe4ger. When you print the string directly, the non-ascii chars are sent to the terminal directly.

Kent