[Python-Dev] Optimize Unicode strings in Python 3.3

Fri May 4 02:52:46 CEST 2012

> Various notes:
>  * PyUnicode_READ() is slower than reading a Py_UNICODE array.
>  * Some decoders unroll the main loop to process 4 or 8 bytes (32 or
> 64 bits CPU) at each step.
>
> I am interested if you know other tricks to optimize Unicode strings
> in Python, or if you are interested to work on this topic.

Beyond creation, the most frequent approach is to specialize loops for
all three possible width, allowing the compiler to hard-code the element
size. This brings it back in performance to the speed of accessing a
Py_UNICODE array (or faster for 1-byte strings).

A possible micro-optimization might be to use pointer arithmetic instead
of indexing. However, I would expect that compilers will already convert
a counting loop into pointer arithmetic if the index is only ever used
for array access.

A source of slow-down appears to be widening copy operations. I wonder
whether microprocessors are able to do this faster than what the compiler
generates out of a naive copying loop.

Another potential area for further optimization is to better pass-through
PyObject*. Some APIs still use char* or Py_UNICODE*, when the caller actually
holds a PyObject*, and the callee ultimate recreates an object out of the
pointers being passed.

Some people (hi Larry) still think that using a rope representation for
string concatenation might improve things, see #1569040.

Regards,
Martin