os.popen encoding!

Wed Feb 18 19:38:45 EST 2009

Thanks for help!

My problem was actualy:
>>> a = ["velja\xe8a 2009"]
>>> print a    #will print
["velja\xe8a 2009"]
>>> Print a[0]    #will print
veljaèa 2009

"Hrvoje Niksic" <hniksic at xemacs.org> wrote in message 
news:87ocwzzvym.fsf at mulj.homelinux.net...
> "Gabriel Genellina" <gagsl-py2 at yahoo.com.ar> writes:
>
>>> I'm playing with os.popen function.
>>> a = os.popen("somecmd").read()
>>>
>>> If one of the lines contains characters like "e", "a"or any other it 
>>> loks
>>> line this "velja\xe8a 2009" with that "\xe8". It prints fine if i go:
>>>
>>> for i in a:
>>>     print i:
>>
>> '\xe8' is a *single* byte (not four). It is the 'LATIN SMALL LETTER E
>> WITH  GRAVE' Unicode code point u'e' encoded in the Windows-1252
>> encoding (and  latin-1, and others too).
>
> Note that it is also 'LATIN SMALL LETTER C WITH CARON' (U+010D or
> u'è'), encoded in Windows-1250, which is what the OP is likely using.
>
> The rest of your message stands regardless: there is no problem, at
> least as long as the OP only prints out the character received from
> somecmd to something else that also expects Windows-1250.  The problem
> would arise if the OP wanted to store the string in a PyGTK label
> (which expects UTF8) or send it to a web browser (which expects
> explicit encoding, probably defaulting to UTF8), in which case he'd
> have to disambiguate whether '\xe8' refers to U+010D or to U+00E8 or
> something else entirely.
>
> That is the problem that Python 3 solves by requiring (or strongly
> suggesting) that such disambiguation be performed as early in the
> program as possible, preferrably while the characters are being read
> from the outside source.  A similar approach is possible using Python
> 2 and its unicode type, but since the OP never specified exactly which
> problem he had (except for the repr/str confusion), it's hard to tell
> if using the unicode type would help.