[Python-3000] PyUnicodeObject implementation

Tue Sep 9 13:13:57 CEST 2008

Hello,

M.-A. Lemburg <mal <at> egenix.com> writes:
> 
> It turned out that the patch only provides a marginal performance
> improvement, so the perceived main argument for the PyVarObject
> implementation doesn't turn out to be a real advantage.

Uh, while the results are not always overwhelming, they are however far better
than the simple freelist improvement (which is not even always an improvement).

>  * a fixed object size results in making good use of the Python
>    allocator, since all objects live in the same pool; as a result
>    you have better cache locality - which is good for situations
>    where you have to deal with lots of objects

I'm not sure how cache locality of unrelated unicode objects helps performance.
However, having a separate allocation in a different pool for the raw character
data implies that cache locality is worse when it comes to actually accessing
the character data (the pointer and the data is points to are in completely
different areas). Pointer chasing makes memory accesses impossible to predict,
and thus access latencies difficult to hide for the CPU.

Anyway, it's just theoretical speculation, I think running benchmarks and
comparing performance numbers is the most reasonable thing we can do (which is a
bit difficult since we don't have real-world benchmarks for string processing;
stringbench and pybench most probably run from the CPU cache and thus don't
really stress memory access patterns; it's why I chose the simplistic split() of
a very large string to demonstrate performance of my patches).

>  * objects should be small in order to have lots of them in
>    the free lists

But the freelists are less efficient since they only avoid one allocation and
not both of them. And if you make them avoid both allocations, then the
freelists are actually bigger in memory (because of more overhead).

Also, those two arguments could be made for lists vs. tuples, but I've never
seen anyone dispute that tuples are more efficient than lists.

> IMHO, it's a lot better to tweak the parameters that we have
> in the Unicode implementation (e.g. raise the KEEPALIVE_SIZE_LIMIT
> to 32, see the ticket for details) and to improve
> the memory allocator for storage of small memory chunks or
> improve the free list management (which Antoine did with his
> free list patch).

But that patch as I said above yields very mixed results, it even degrades
performance in some case. I'm not against apply (some variant of) it, but it's
really not a deal-breaker.

Regards

Antoine.