unicode() vs. s.decode()

Fri Aug 7 06:00:42 EDT 2009

* Steven D'Aprano (06 Aug 2009 19:17:30 GMT)
> On Thu, 06 Aug 2009 20:05:52 +0200, Thorsten Kampe wrote:
> > > That is significant! So the winner is:
> > > 
> > > unicode('äöüÄÖÜß','utf-8')
> > 
> > Unless you are planning to write a loop that decodes "äöüÄÖÜß" one
> > million times, these benchmarks are meaningless.
> 
> What if you're writing a loop which takes one million different lines of 
> text and decodes them once each?
> 
> >>> setup = 'L = ["abc"*(n%100) for n in xrange(1000000)]'
> >>> t1 = timeit.Timer('for line in L: line.decode("utf-8")', setup)
> >>> t2 = timeit.Timer('for line in L: unicode(line, "utf-8")', setup)
> >>> t1.timeit(number=1)
> 5.6751680374145508
> >>> t2.timeit(number=1)
> 2.6822888851165771
> 
> Seems like a pretty meaningful difference to me.

Bollocks. No one will even notice whether a code sequence runs 2.7 or 
5.7 seconds. That's completely artificial benchmarking.

Thorsten