unicode() vs. s.decode()
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Thu Aug 6 15:17:30 EDT 2009
On Thu, 06 Aug 2009 20:05:52 +0200, Thorsten Kampe wrote:
> > That is significant! So the winner is:
> >
> > unicode('äöüÄÖÜß','utf-8')
>
> Unless you are planning to write a loop that decodes "äöüÄÖÜß" one
> million times, these benchmarks are meaningless.
What if you're writing a loop which takes one million different lines of
text and decodes them once each?
>>> setup = 'L = ["abc"*(n%100) for n in xrange(1000000)]'
>>> t1 = timeit.Timer('for line in L: line.decode("utf-8")', setup)
>>> t2 = timeit.Timer('for line in L: unicode(line, "utf-8")', setup)
>>> t1.timeit(number=1)
5.6751680374145508
>>> t2.timeit(number=1)
2.6822888851165771
Seems like a pretty meaningful difference to me.
--
Steven
More information about the Python-list
mailing list