unicode() vs. s.decode()

Fri Aug 7 06:12:32 EDT 2009

* Michael Ströder (Fri, 07 Aug 2009 03:25:03 +0200)
> Thorsten Kampe wrote:
> > * Michael Ströder (Thu, 06 Aug 2009 18:26:09 +0200)
> >>>>> timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(10000000)
> >> 17.23644495010376
> >>>>> timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(10000000)
> >> 72.087096929550171
> >>
> >> That is significant! So the winner is:
> >>
> >> unicode('äöüÄÖÜß','utf-8')
> > 
> > Unless you are planning to write a loop that decodes "äöüÄÖÜß" one 
> > million times, these benchmarks are meaningless.
> 
> Well, I can tell you I would not have posted this here and checked it if it
> would be meaningless for me. You don't have to read and answer this thread if
> it's meaningless to you.

Again: if you think decoding "äöüÄÖÜß" one million times is a real world 
use case for your module then go for unicode(). Otherwise the time you 
spent benchmarking artificial cases like this is just wasted time. In 
real life people won't even notice whether an application takes one or 
two minutes to complete.

Use whatever you prefer (decode() or unicode()). If you experience 
performance bottlenecks when you're done, test whether changing decode() 
to unicode() makes a difference. /That/ is relevant.

Thorsten