[Python-checkins] python/dist/src/Objects unicodeobject.c,2.139,2.140

M.-A. Lemburg mal@lemburg.com
Sat, 20 Apr 2002 17:26:05 +0200


loewis@sourceforge.net wrote:
> 
> Update of /cvsroot/python/python/dist/src/Objects
> In directory usw-pr-cvs1:/tmp/cvs-serv30961
> 
> Modified Files:
>         unicodeobject.c
> Log Message:
> Patch #495401: Count number of required bytes for encoding UTF-8 before
> allocating the target buffer.

Martin, please back out this change again. We have discussed this
quite a few times and I am against using your strategy since
it introduces a performance hit which does not relate to the
gained advantage of (temporarily) using less memory.

Your timings also show this, so I wonder why you checked in this
patch, e.g. from the patch log:
"""
For the current
CVS (unicodeobject.c 2.136: MAL's change to use a variable
overalloc), I get

10 spaces                      20.060
100 spaces                     2.600
200 spaces                     2.030
1000 spaces                    0.930
10000 spaces                   0.690
10 spaces, 3 bytes             23.520
100 spaces, 3 bytes            3.730
200 spaces, 3 bytes            2.470
1000 spaces, 3 bytes           0.980
10000 spaces, 3 bytes          0.690
30 bytes                       24.800
300 bytes                      5.220
600 bytes                      3.830
3000 bytes                     2.480
30000 bytes                    2.230

With unicode3.diff (that's the one you checked in), I get

10 spaces                      19.940
100 spaces                     3.260
200 spaces                     2.340
1000 spaces                    1.650
10000 spaces                   1.450
10 spaces, 3 bytes             21.420
100 spaces, 3 bytes            3.410
200 spaces, 3 bytes            2.420
1000 spaces, 3 bytes           1.660
10000 spaces, 3 bytes          1.450
30 bytes                       22.260
300 bytes                      5.830
600 bytes                      4.700
3000 bytes                     3.740
30000 bytes                    3.540
"""

The only case where your patch is faster is for very short
strings and then only by a few percent, whereas for all
longer strings you get worse timings, e.g. 3.74 seconds
compared to 2.48 seconds -- that's a 50% increase in
run-time !

Thanks,
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/