os.popen encoding!

Wed Feb 18 07:38:54 EST 2009

En Wed, 18 Feb 2009 10:09:24 -0200, SMALLp <pofuk at email.t-com.hr> escribió:

> Hy.
> I'm playing with os.popen function.
> a = os.popen("somecmd").read()
>
> If one of the lines contains characters like "è", "æ"or any other it loks
> line this "velja\xe8a 2009" with that "\xe8". It prints fine if i go:
>
> for i in a:
>     print i:
>

'\xe8' is a *single* byte (not four). It is the 'LATIN SMALL LETTER E WITH  
GRAVE' Unicode code point u'è' encoded in the Windows-1252 encoding (and  
latin-1, and others too). This is the usual Windows encoding (in "Western  
Europe" but seems to cover a much larger part of the world... most of  
America, if not all).

When you *look* at some string in the interpreter, you see its repr()  
(note the surrounding quotes). When you *print* some string, you get its  
contents:

py> s = "ma mère"
py> s
'ma m\x8are'
py> print s
ma mère
py> print repr(s)
'ma m\x8are'

> How to solve this and where exectly is problem with print or read!  
> Windows
> XP, Python 2.5.4

There is *no* problem. You should read the Unicode howto:  
<http://docs.python.org/howto/unicode.html>
If you still think there is a problem, please provide more details.

-- 
Gabriel Genellina