[Python-Dev] Unicode charmap decoders slow

Wed Oct 5 17:52:54 CEST 2005

Hye-Shik Chang wrote:
> On 10/5/05, M.-A. Lemburg <mal at egenix.com> wrote:
> 
>>Of course, a C version could use the same approach as
>>the unicodedatabase module: that of compressed lookup
>>tables...
>>
>>        http://aggregate.org/TechPub/lcpc2002.pdf
>>
>>genccodec.py anyone ?
>>
> 
> 
> I had written a test codec for single byte character sets to evaluate
> algorithms to use in CJKCodecs once before  (it's not a direct
> implemention of you've mentioned, tough) I just ported it
> to unicodeobject (as attached). 

Thanks. Please upload the patch to SF.

Looks like we now have to competing patches: yours and the
one written by Walter.

So far you've only compared decoding strings into Unicode
and they seem to be similar in performance. Do they differ
in encoding performance ?

> It showed relatively fine result
> than charmap codecs:
> 
> % python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)"
> "s.decode('iso8859-1')"
> 10 loops, best of 3: 96.7 msec per loop
> % ./python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)"
> "s.decode('iso8859_10_fc')"
> 10 loops, best of 3: 22.7 msec per loop
> % ./python ./Lib/timeit.py -s "s='a'*1024*1024; u=unicode(s)"
> "s.decode('utf-8')"
> 100 loops, best of 3: 18.9 msec per loop
> 
> (Note that it doesn't contain any documentation nor good error
> handling yet. :-)
> 
> 
> Hye-Shik

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 05 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::