[Python-Dev] Unicode charmap decoders slow

Thu Oct 6 13:33:11 CEST 2005

On 10/6/05, M.-A. Lemburg <mal at egenix.com> wrote:
> Hye-Shik Chang wrote:
> > (encoding, fastmap codec)
> >
> > % ./python Lib/timeit.py -s "s='a'*53*1024; e='iso8859_10_fc';
> > u=unicode(s, e)" "u.encode(e)"
> > 1000 loops, best of 3: 536 usec per loop
> >
> > (encoding, utf-8 codec)
> >
> > % ./python Lib/timeit.py -s "s='a'*53*1024; e='utf_8'; u=unicode(s,
> > e)" "u.encode(e)"
> > 1000 loops, best of 3: 1.5 msec per loop
>
> I wonder why the UTF-8 codec is slower than the fastmap
> codec in this case.

I guess that resizing made the difference.  fastmap encoder doesn't
resize the output buffer at all in the test case while UTF-8 encoder
allocates 4*53*1024 bytes and resizes it to 53*1024 bytes in the end.

Hye-Shik