[Tutor] testing u=unicode(str, 'utf-8') and u = str.decode('utf-8')
Kent Johnson
kent37 at tds.net
Thu Apr 6 12:01:58 CEST 2006
Keo Sophon wrote:
> Hi,
>
> Today i tested u=unicode(str,'utf-8') and u=str.decode('utf-8'). Then in both
> case I used:
>
> if isinstance(u,str):
> print "just string"
> else:
> print "unicode"
>
> the result of both case are "unicode". So it seems u=unicode(str,'utf-8') and
> u=str.decode('utf-8') are the same. How about the processing inside? is it
> same?
I don't know the details of how they are implemented but they do have
the same result. As far as I know you can use whichever form you find
more readable.
There are a few special-purpose encodings for which the result of
decode() is a byte string rather than a unicode string; for these
encodings, you have to use str.decode(). For example:
In [42]: 'abc'.decode('string_escape')
Out[42]: 'abc'
In [44]: unicode('abc', 'string_escape')
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in ?
TypeError: decoder did not return an unicode object (type=str)
Kent
More information about the Tutor
mailing list