
Walter Dörwald wrote:
Am 04.10.2005 um 04:25 schrieb jepler@unpythonic.net:
As the OP suggests, decoding with a codec like mac-roman or iso8859-1 is very slow compared to encoding or decoding with utf-8. Here I'm working with 53k of data instead of 53 megs. (Note: this is a laptop, so it's possible that thermal or battery management features affected these numbers a bit, but by a factor of 3 at most)
$ timeit.py -s "s='a'*53*1024; u=unicode(s)" "u.encode('utf-8')" 1000 loops, best of 3: 591 usec per loop $ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('utf-8')" 1000 loops, best of 3: 1.25 msec per loop $ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('mac-roman')" 100 loops, best of 3: 13.5 msec per loop $ timeit.py -s "s='a'*53*1024; u=unicode(s)" "s.decode('iso8859-1')" 100 loops, best of 3: 13.6 msec per loop
With utf-8 encoding as the baseline, we have decode('utf-8') 2.1x as long decode('mac-roman') 22.8x as long decode('iso8859-1') 23.0x as long
Perhaps this is an area that is ripe for optimization.
For charmap decoding we might be able to use an array (e.g. a tuple (or an array.array?) of codepoints instead of dictionary.
Or we could implement this array as a C array (i.e. gencodec.py would generate C code).
That would be a possibility, yes. Note that the charmap codec was meant as faster replacement for the old string transpose function. Dictionaries are used for the mapping to avoid having to store huge (largely empty) mapping tables - it's a memory-speed tradeoff. Of course, a C version could use the same approach as the unicodedatabase module: that of compressed lookup tables... http://aggregate.org/TechPub/lcpc2002.pdf genccodec.py anyone ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 04 2005)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::