[Python-Dev] RE: [Python-checkins] python/dist/src/Objects unicodeobject.c,2.139,2.140

Martin v. Loewis martin@v.loewis.de
21 Apr 2002 11:45:34 +0200


Tim Peters <tim.one@comcast.net> writes:

> I expect Martin checked in this change because of the unhappy hours he spent
> determining that the previous two versions of this function wrote beyond the
> memory they allocated.  Since the most recent version still didn't bother to
> assert that it wasn't writing out of bounds, I can't blame Martin for
> checking in a version that does so assert; since I spent hours on this too,
> and this function has a repeated history of bad memory behavior, I viewed
> the version Martin replaced as unacceptable.

Exactly. I think the most recent version is worse than the one we had
before.

> However, the slowdown on large strings isn't attractive, and the previous
> version could easily enough have asserted its memory correctness.

I found the overallocations strategy that this code had just so
embarrassing: a single three-byte character will cause an
overallocation of three times the memory, which means that the final
realloc will certainly truncate lots of bytes. As a result, we are at
the mercy of the realloc implementation here: If realloc copies memory
(such as Pymalloc might some day) when shrinking buffers, performance
will get worse.

Since this appears to be religious, I'm backing the patch out.

Regards,
Martin