[Python-Dev] RE: [Python-checkins] python/dist/src/Objects unicodeobject.c,2.139,2.140

Tim Peters tim.one@comcast.net
Sun, 21 Apr 2002 00:22:39 -0400


I expect Martin checked in this change because of the unhappy hours he spent
determining that the previous two versions of this function wrote beyond the
memory they allocated.  Since the most recent version still didn't bother to
assert that it wasn't writing out of bounds, I can't blame Martin for
checking in a version that does so assert; since I spent hours on this too,
and this function has a repeated history of bad memory behavior, I viewed
the version Martin replaced as unacceptable.

However, the slowdown on large strings isn't attractive, and the previous
version could easily enough have asserted its memory correctness.

> -----Original Message-----
> From: python-checkins-admin@python.org
> [mailto:python-checkins-admin@python.org]On Behalf Of M.-A. Lemburg
> Sent: Saturday, April 20, 2002 11:26 AM
> To: loewis@sourceforge.net
> Cc: python-checkins@python.org
> Subject: Re: [Python-checkins] python/dist/src/Objects
> unicodeobject.c,2.139,2.140
>
>
> loewis@sourceforge.net wrote:
>>
>> Update of /cvsroot/python/python/dist/src/Objects
>> In directory usw-pr-cvs1:/tmp/cvs-serv30961
>>
>> Modified Files:
>>         unicodeobject.c
>> Log Message:
>> Patch #495401: Count number of required bytes for encoding UTF-8
>> before allocating the target buffer.
>
> Martin, please back out this change again. We have discussed this
> quite a few times and I am against using your strategy since
> it introduces a performance hit which does not relate to the
> gained advantage of (temporarily) using less memory.
>
> Your timings also show this, so I wonder why you checked in this
> patch, e.g. from the patch log:
> """
> For the current
> CVS (unicodeobject.c 2.136: MAL's change to use a variable
> overalloc), I get
>
> 10 spaces                      20.060
> 100 spaces                     2.600
> 200 spaces                     2.030
> 1000 spaces                    0.930
> 10000 spaces                   0.690
> 10 spaces, 3 bytes             23.520
> 100 spaces, 3 bytes            3.730
> 200 spaces, 3 bytes            2.470
> 1000 spaces, 3 bytes           0.980
> 10000 spaces, 3 bytes          0.690
> 30 bytes                       24.800
> 300 bytes                      5.220
> 600 bytes                      3.830
> 3000 bytes                     2.480
> 30000 bytes                    2.230
>
> With unicode3.diff (that's the one you checked in), I get
>
> 10 spaces                      19.940
> 100 spaces                     3.260
> 200 spaces                     2.340
> 1000 spaces                    1.650
> 10000 spaces                   1.450
> 10 spaces, 3 bytes             21.420
> 100 spaces, 3 bytes            3.410
> 200 spaces, 3 bytes            2.420
> 1000 spaces, 3 bytes           1.660
> 10000 spaces, 3 bytes          1.450
> 30 bytes                       22.260
> 300 bytes                      5.830
> 600 bytes                      4.700
> 3000 bytes                     3.740
> 30000 bytes                    3.540
> """
>
> The only case where your patch is faster is for very short
> strings and then only by a few percent, whereas for all
> longer strings you get worse timings, e.g. 3.74 seconds
> compared to 2.48 seconds -- that's a 50% increase in
> run-time !
>
> Thanks,
> --
> Marc-Andre Lemburg
> CEO eGenix.com Software GmbH
> ______________________________________________________________________
> Company & Consulting:                           http://www.egenix.com/
> Python Software:                   http://www.egenix.com/files/python/
>
>
> _______________________________________________
> Python-checkins mailing list
> Python-checkins@python.org
> http://mail.python.org/mailman/listinfo/python-checkins