[Python-Dev] Unicode charmap decoders slow

Tue Oct 4 04:25:48 CEST 2005

As the OP suggests, decoding with a codec like mac-roman or iso8859-1 is very
slow compared to encoding or decoding with utf-8.  Here I'm working with 53k of
data instead of 53 megs.  (Note: this is a laptop, so it's possible that
thermal or battery management features affected these numbers a bit, but by a
factor of 3 at most)

$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "u.encode('utf-8')"
1000 loops, best of 3: 591 usec per loop
$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('utf-8')"
1000 loops, best of 3: 1.25 msec per loop
$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('mac-roman')"
100 loops, best of 3: 13.5 msec per loop
$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('iso8859-1')"
100 loops, best of 3: 13.6 msec per loop

With utf-8 encoding as the baseline, we have
	decode('utf-8')      2.1x as long
	decode('mac-roman') 22.8x as long
	decode('iso8859-1') 23.0x as long

Perhaps this is an area that is ripe for optimization.

Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20051003/28a511e5/attachment.pgp