unicode to string conversion
Jeff Epler
jepler at unpythonic.net
Thu May 8 15:29:30 EDT 2003
On Thu, May 08, 2003 at 01:24:33PM -0500, Skip Montanaro wrote:
>
> Luca> I would like to translate
>
> Luca> u'questa \xe8 bella'
> Luca> into
> Luca> 'questa è bella'
>
> Luca> and put the result into a new variable
>
> I love easy questions!
>
> >>> u = u'questa \xe8 bella'
> >>> s = u.encode("iso-8859-1")
> >>> print s
> questa è bella
Huh, doesn't work here
>>> u = u'questa \xe8 bella'
>>> s = u.encode("iso-8859-1")
>>> print s
questa [] bella
where [] is a box-shaped character displayed for an invalid byte
sequence.
On my system, I must write
>>> print u.encode("utf")
questa è bella
to get the proper result
and on some windows system you would probably write
>>> print u.encode("cp850")
to do the deed.
It *may* be that the encoding returned by
locale.getdefaultlocale()[1]
is the one that should be used (and it is on my system), or it may be
that the OP only needs the value to work on a single computer and can
determine the right encoding through educated guessing.
Jeff
PS looks like my mailer will re-encode this message as latin-1 when it
sends it, so who knows whether the u'\xe8' characters will continue to
display correctly..
More information about the Python-list
mailing list