[Tutor] simple unicode question

Albert-Jan Roskam fomcl at yahoo.com
Fri Aug 22 23:10:21 CEST 2014


Hi,

I have data that is either floats or byte strings in utf-8. I need to cast both to unicode strings. I am probably missing something simple, but.. in the code below, under "float", why does [B] throw an error but [A] does not?


# Python 2.7.3 (default, Feb 27 2014, 19:39:10) [GCC 4.7.2] on linux2
>>> help(unicode)
Help on class unicode in module __builtin__:

class unicode(basestring)
 |  unicode(string [, encoding[, errors]]) -> object
 |  
 |  Create a new Unicode object from the given encoded string.
 |  encoding defaults to the **current default string encoding**.
 |  errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'.
# ... 
>>> import sys
>>> sys.getdefaultencoding()
'ascii'

# float: cannot explicitly give encoding, even if it's the default
>>> value = 1.0
>>> unicode(value)      # [A]
u'1.0'
>>> unicode(value, sys.getdefaultencoding())  # [B]

Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
    unicode(value, sys.getdefaultencoding())
TypeError: coercing to Unicode: need string or buffer, float found
>>> unicode(value, "utf-8")
# (... also TypeError)

# byte string: must explicitly give encoding (which makes perfect sense)
>>> value = '\xc3\xa9'
>>> unicode(value)

Traceback (most recent call last):
  File "<pyshell#31>", line 1, in <module>
    unicode(value)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> unicode(value, "utf-8")
u'\xe9'


Thank you!

Regards,

Albert-Jan




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a 

fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


More information about the Tutor mailing list