unicode() vs. s.decode()
Michael Ströder
michael at stroeder.com
Thu Aug 6 12:26:09 EDT 2009
Thorsten Kampe wrote:
> * Michael Ströder (Wed, 05 Aug 2009 16:43:09 +0200)
>> These both expressions are equivalent but which is faster or should be
>> used for any reason?
>>
>> u = unicode(s,'utf-8')
>>
>> u = s.decode('utf-8') # looks nicer
>
> "decode" was added in Python 2.2 for the sake of symmetry to encode().
Yes, and I like the style. But...
> It's essentially the same as unicode() and I wouldn't be surprised if it
> is exactly the same.
Did you try?
> I don't think any measurable speed increase will be noticeable between
> those two.
Well, seems not to be true. Try yourself. I did (my console has UTF-8 as charset):
Python 2.6 (r26:66714, Feb 3 2009, 20:52:03)
[GCC 4.3.2 [gcc-4_3-branch revision 141291]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer("'äöüÄÖÜß'.decode('utf-8')").timeit(1000000)
7.2721178531646729
>>> timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(1000000)
7.1302499771118164
>>> timeit.Timer("unicode('äöüÄÖÜß','utf8')").timeit(1000000)
8.3726329803466797
>>> timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(1000000)
1.8622009754180908
>>> timeit.Timer("unicode('äöüÄÖÜß','utf8')").timeit(1000000)
8.651669979095459
>>>
Comparing again the two best combinations:
>>> timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(10000000)
17.23644495010376
>>> timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(10000000)
72.087096929550171
That is significant! So the winner is:
unicode('äöüÄÖÜß','utf-8')
Ciao, Michael.
More information about the Python-list
mailing list