unicode() vs. s.decode()
Thorsten Kampe
thorsten at thorstenkampe.de
Fri Aug 7 06:00:42 EDT 2009
* Steven D'Aprano (06 Aug 2009 19:17:30 GMT)
> On Thu, 06 Aug 2009 20:05:52 +0200, Thorsten Kampe wrote:
> > > That is significant! So the winner is:
> > >
> > > unicode('äöüÄÖÜß','utf-8')
> >
> > Unless you are planning to write a loop that decodes "äöüÄÖÜß" one
> > million times, these benchmarks are meaningless.
>
> What if you're writing a loop which takes one million different lines of
> text and decodes them once each?
>
> >>> setup = 'L = ["abc"*(n%100) for n in xrange(1000000)]'
> >>> t1 = timeit.Timer('for line in L: line.decode("utf-8")', setup)
> >>> t2 = timeit.Timer('for line in L: unicode(line, "utf-8")', setup)
> >>> t1.timeit(number=1)
> 5.6751680374145508
> >>> t2.timeit(number=1)
> 2.6822888851165771
>
> Seems like a pretty meaningful difference to me.
Bollocks. No one will even notice whether a code sequence runs 2.7 or
5.7 seconds. That's completely artificial benchmarking.
Thorsten
More information about the Python-list
mailing list