[Python-Dev] Unicode objects more space efficient than plain strings? can that be?

Tim Peters tim.one@comcast.net
Thu, 02 May 2002 14:36:36 -0400


It's a cute one.  Setting the envar PYTHONMALLOCSTATS in a debug build
zeroes in on the cause:  PyString_Format() allocates its result space via

	reslen = rescnt = fmtcnt + 100;
	result = PyString_FromStringAndSize((char *)NULL, reslen);

and that's way more space (about 100 bytes more) than is actually needed for
the result of

    "abc%d" % i

But it's still small enough for pymalloc to handle on its own, and the
pymalloc realloc currently never shrinks a block (it's in the nature of this
kind of allocator that shrinking requires copying the whold block -- it's "a
speed thing").  So each result string object contains more than 100 bytes of
string space, mostly unused.

We could worm around this in lots of ways.  I'm inclined to change the
pymalloc realloc to copy a shrinking block if at least 25% of the input
block would go away, else leave it alone.  In this specific case, something
like 90% of the input block could be reclaimed.