[Tutor] testing u=unicode(str, 'utf-8') and u = str.decode('utf-8')

Kent Johnson kent37 at tds.net
Thu Apr 6 12:01:58 CEST 2006


Keo Sophon wrote:
> Hi,
> 
> Today i tested u=unicode(str,'utf-8') and u=str.decode('utf-8'). Then in both 
> case I used:
> 
> if isinstance(u,str):
>    print "just string"
> else:
>   print "unicode"
> 
> the result of both case are "unicode". So it seems u=unicode(str,'utf-8') and 
> u=str.decode('utf-8') are the same. How about the processing inside? is it 
> same?

I don't know the details of how they are implemented but they do have 
the same result. As far as I know you can use whichever form you find 
more readable.

There are a few special-purpose encodings for which the result of 
decode() is a byte string rather than a unicode string; for these 
encodings, you have to use str.decode(). For example:

In [42]: 'abc'.decode('string_escape')
Out[42]: 'abc'

In [44]: unicode('abc', 'string_escape')
------------------------------------------------------------
Traceback (most recent call last):
   File "<ipython console>", line 1, in ?
TypeError: decoder did not return an unicode object (type=str)

Kent



More information about the Tutor mailing list