[Python-3000] Allocation of unicode objects
Antoine Pitrou
solipsis at pitrou.net
Sun Feb 10 13:53:53 CET 2008
Hi,
Since there are discussions going on on the topic of allocation algorithms for
various built-in types, I thought I'd mention there's a patch for turning
unicode objects into variable-sized objects (rather than using a
separately-allocated buffer). The aim is to make allocation of those objects
lighter, and relieve cache and memory pressure a bit.
http://bugs.python.org/issue1943
Marc-André Lemburg expressed skepticism, based on the fact that it made
subclassing unicode objects as part of C extensions more difficult.
And here is a microbenchmark of the thing:
Splitting a small string:
./python -m timeit -s "s=open('INTBENCH', 'r').read()" "s.split()"
-> Unpatched py3k: 26.4 usec per loop
-> PyVarObject patch: 20.2 usec per loop
Splitting a medium-sized string:
./python -m timeit -s "s=open('LICENSE', 'r').read()" "s.split()"
-> Unpatched py3k: 458 usec per loop
-> PyVarObject patch: 316 usec per loop
Splitting a long string:
./python -m timeit -s "s=open('Misc/HISTORY', 'r').read()" "s.split()"
-> Unpatched py3k: 31.3 msec per loop
-> PyVarObject patch: 17.8 msec per loop
Even if the patch is rejected, I think it is important to remember that
implementation characteristics of the unicode type will be crucial for Py3k
performance :-)
Regards
Antoine.
More information about the Python-3000
mailing list