[Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin)

Tim Peters tim.one@comcast.net
Fri, 26 Apr 2002 17:47:54 -0400


[Guido]
> Would it make sense to change the Unicode object to use pymalloc, and
> to change the UTF-8 codec to count the bytes if the shortest possible
> output would fit in a pymalloc block?

These are independent questions, and I don't know how to answer either
unless you give me a test program that prints the value of the function
you're trying to minimize <0.7 wink>.

The Unicode object currenly uses quite an elaborate free list, caching both
PyUnicodeObject structs (which currently use pymalloc), and their str
members (which currently do not).  Whether the str member uses pymalloc
really doesn't have anything to do with what the UTF8 encoder function does
(it returns plain strings, and those already use pymalloc today -- and it's
not entirely clear whether they should either!).

Counting the bytes in the UTF8 decoder could work well, independent of that:
if the result is known to fit in a pymalloc block, just do it; as soon as
it's known that it won't, overallocate with assurance that the system
realloc will give back everything that isn't used.  In the latter case I
believe the code could be made much simpler, by doing a factor-of-4
overallocation from the start (it currently tries 2, then 3, then 4, with a
bunch of embedded-in-the-loops tests to prevent overwrites; I'm not sure why
it bothers with this staggered scheme, since it's going to touch exactly as
much memory as it actually needs regardless, and give all the rest back
untouched).

> (I guess this means that the length of the Unicode string should be
> less than SMALL_REQUEST_THRESHOLD - currently 256.)

For a start, yes.  I'd stick a "Py_" in front of that symbol and expose it
then.  The cutoff test would also have to take into account the size of the
result's PyStringObject header (the whole stringobject enchilada counts
against the threshold).