Unicode confusion

Mark Tolonen M8R-yfto6h at mailinator.com
Tue Jul 15 09:03:51 CEST 2008


"Jerry Hill" <malaclypse2 at gmail.com> wrote in message 
news:mailman.14.1216054283.922.python-list at python.org...
> On Mon, Jul 14, 2008 at 12:40 PM, Tim Cook <timothywayne.cook at gmail.com> 
> wrote:
>> if I say units=unicode("°").  I get
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
>> ordinal not in range(128)
>>
>> If I try x=unicode.decode(x,'utf-8'). I get
>> TypeError: descriptor 'decode' requires a 'unicode' object but received
>> a 'str'
>>
>> What is the correct way to interpret these symbols that come to me as a
>> string?
>
> Part of it depends on where you're getting them from.  If they are in
> your source code, just define them like this:
>
>>>> units = u"°"
>>>> print units
> °
>>>> print repr(units)
> u'\xb0'
>
> If they're coming from an external source, you have to know the
> encoding they're being sent in.  Then you can decode them into
> unicode, like this:
>
>>>> units = "°"
>>>> unicode_units = units.decode('Latin-1')
>>>> print repr(unicode_units)
> u'\xb0'
>>>> print unicode_units
> °
>
> -- 
> Jerry
>

Even with source code you have to know the encoding.  for pre-3.x, Python 
defaults to ascii encoding for source files:

test.py contains:
units = u"°"

>>> import test
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "test.py", line 1
SyntaxError: Non-ASCII character '\xb0' in file test.py on line 1, but no 
encoding declared; see http://www.python.org/peps/pep-0263.html for details

The encoding of the source file can be declared:

# coding: latin-1
units = u"°"

>>> import test
>>> test.units
u'\xb0'
>>> print test.units
°

Make sure to use the correct encoding!  Here the file was saved in latin-1, 
but declared utf8:

# coding: utf8
units = u"°"

>>> import test
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0: 
unexpected code byte
>>>

--
Mark 




More information about the Python-list mailing list