[issue1943] improved allocation of PyUnicode objects

Thu Mar 20 11:04:01 CET 2008

Marc-Andre Lemburg <mal at egenix.com> added the comment:

Antoine, as I've already mentioned in my other comments, I'm -1 on
changing the Unicode object to a variable size object.

I also don't think that the micro-benchmarks you are applying really do
test the implementation in a real-life situations. The only case where
your patch appears significantly faster is the "long string" case. If
you look at the distribution of the Unicode strings generated by this
case, you'll find that most strings have less than 10-20 characters.
Optimizing pymalloc for these cases and tuning the parameters in the
Unicode implementation will likely give you the same performance
increase without having to sacrifice the advantages of using a pointer
in the object rather than a inlining the data.

I'm +1 on the free list changes, though, in the long run, I think that
putting more efforts into improving pymalloc and removing the free lists
altogether would be better.

BTW: Unicode slices would be a possible and fairly attractive target for
a C level subclass of Unicode objects. The pointer in the Unicode object
implementation could then point to the original Unicode object's buffer
and the subclass would add an extra pointer to the original object
itself (in order to keep it alive). The Unicode type (written by Fredrik
Lundh) I used as basis for the Unicode implementation worked with this idea.

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1943>
__________________________________