[Python-3000] Allocation of unicode objects

Sun Feb 10 17:27:52 CET 2008

On Feb 10, 2008 4:53 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Since there are discussions going on on the topic of allocation algorithms for
> various built-in types, I thought I'd mention there's a patch for turning
> unicode objects into variable-sized objects (rather than using a
> separately-allocated buffer). The aim is to make allocation of those objects
> lighter, and relieve cache and memory pressure a bit.
>
> http://bugs.python.org/issue1943
>
> Marc-André Lemburg expressed skepticism, based on the fact that it made
> subclassing unicode objects as part of C extensions more difficult.

Has anybody ever tried that? The same would apply to PyString and I've
never heard this complaint. I think that given the relative importance
of fast strings in Py3k vs. the convenience of subclassing PyUnicode,
the latter may have to suffer.

> And here is a microbenchmark of the thing:
>
> Splitting a small string:
> ./python -m timeit -s "s=open('INTBENCH', 'r').read()" "s.split()"
> -> Unpatched py3k: 26.4 usec per loop
> -> PyVarObject patch: 20.2 usec per loop
>
> Splitting a medium-sized string:
> ./python -m timeit -s "s=open('LICENSE', 'r').read()" "s.split()"
> -> Unpatched py3k: 458 usec per loop
> -> PyVarObject patch: 316 usec per loop
>
> Splitting a long string:
> ./python -m timeit -s "s=open('Misc/HISTORY', 'r').read()" "s.split()"
> -> Unpatched py3k: 31.3 msec per loop
> -> PyVarObject patch: 17.8 msec per loop
>
> Even if the patch is rejected, I think it is important to remember that
> implementation characteristics of the unicode type will be crucial for Py3k
> performance :-)

Right. I haven't had enough time to review this (or any other patch),
but the idea is very appealing.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)