pofuk at email.t-com.hr
Thu Feb 19 01:38:45 CET 2009
Thanks for help!
My problem was actualy:
>>> a = ["velja\xe8a 2009"]
>>> print a #will print
>>> Print a #will print
"Hrvoje Niksic" <hniksic at xemacs.org> wrote in message
news:87ocwzzvym.fsf at mulj.homelinux.net...
> "Gabriel Genellina" <gagsl-py2 at yahoo.com.ar> writes:
>>> I'm playing with os.popen function.
>>> a = os.popen("somecmd").read()
>>> If one of the lines contains characters like "e", "a"or any other it
>>> line this "velja\xe8a 2009" with that "\xe8". It prints fine if i go:
>>> for i in a:
>>> print i:
>> '\xe8' is a *single* byte (not four). It is the 'LATIN SMALL LETTER E
>> WITH GRAVE' Unicode code point u'e' encoded in the Windows-1252
>> encoding (and latin-1, and others too).
> Note that it is also 'LATIN SMALL LETTER C WITH CARON' (U+010D or
> u'è'), encoded in Windows-1250, which is what the OP is likely using.
> The rest of your message stands regardless: there is no problem, at
> least as long as the OP only prints out the character received from
> somecmd to something else that also expects Windows-1250. The problem
> would arise if the OP wanted to store the string in a PyGTK label
> (which expects UTF8) or send it to a web browser (which expects
> explicit encoding, probably defaulting to UTF8), in which case he'd
> have to disambiguate whether '\xe8' refers to U+010D or to U+00E8 or
> something else entirely.
> That is the problem that Python 3 solves by requiring (or strongly
> suggesting) that such disambiguation be performed as early in the
> program as possible, preferrably while the characters are being read
> from the outside source. A similar approach is possible using Python
> 2 and its unicode type, but since the OP never specified exactly which
> problem he had (except for the repr/str confusion), it's hard to tell
> if using the unicode type would help.
More information about the Python-list