unicode() vs. s.decode()
thorsten at thorstenkampe.de
Fri Aug 7 12:12:32 CEST 2009
* Michael Ströder (Fri, 07 Aug 2009 03:25:03 +0200)
> Thorsten Kampe wrote:
> > * Michael Ströder (Thu, 06 Aug 2009 18:26:09 +0200)
> >>>>> timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(10000000)
> >> 17.23644495010376
> >>>>> timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(10000000)
> >> 72.087096929550171
> >> That is significant! So the winner is:
> >> unicode('äöüÄÖÜß','utf-8')
> > Unless you are planning to write a loop that decodes "äöüÄÖÜß" one
> > million times, these benchmarks are meaningless.
> Well, I can tell you I would not have posted this here and checked it if it
> would be meaningless for me. You don't have to read and answer this thread if
> it's meaningless to you.
Again: if you think decoding "äöüÄÖÜß" one million times is a real world
use case for your module then go for unicode(). Otherwise the time you
spent benchmarking artificial cases like this is just wasted time. In
real life people won't even notice whether an application takes one or
two minutes to complete.
Use whatever you prefer (decode() or unicode()). If you experience
performance bottlenecks when you're done, test whether changing decode()
to unicode() makes a difference. /That/ is relevant.
More information about the Python-list