Unicode confusion

Jerry Hill malaclypse2 at gmail.com
Mon Jul 14 12:51:01 EDT 2008


On Mon, Jul 14, 2008 at 12:40 PM, Tim Cook <timothywayne.cook at gmail.com> wrote:
> if I say units=unicode("°").  I get
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
> ordinal not in range(128)
>
> If I try x=unicode.decode(x,'utf-8'). I get
> TypeError: descriptor 'decode' requires a 'unicode' object but received
> a 'str'
>
> What is the correct way to interpret these symbols that come to me as a
> string?

Part of it depends on where you're getting them from.  If they are in
your source code, just define them like this:

>>> units = u"°"
>>> print units
°
>>> print repr(units)
u'\xb0'

If they're coming from an external source, you have to know the
encoding they're being sent in.  Then you can decode them into
unicode, like this:

>>> units = "°"
>>> unicode_units = units.decode('Latin-1')
>>> print repr(unicode_units)
u'\xb0'
>>> print unicode_units
°

-- 
Jerry


More information about the Python-list mailing list