[Python-Dev] CPython optimization: storing reference counters outside of objects

Sun May 22 16:23:55 CEST 2011

>> 1. CPU cache lines (64 bytes on X86) containing a beginning of a
>> PyObject are very often invalidated, resulting in loosing many chances
>> to use the CPU caches
>
> Mutating data doesn't invalidate a cache line. It just makes it
> necessary to write it back to memory at some point.
>

I think he's referring to the multi-core case.
In MESI terminology, the cache line will become modified in the
current cache (current thread),  but invalid in other cores' caches.
But given that objects are accessed serialized by the GIL (which will
issue a memory barrier anyway), I'm not sure that the performance
impact will be noticeable. Furthermore, given that threads are
actually serialized, I suspect that the scheduler tends to bind them
naturally to the same CPU.

>> 2. The copy-on-write after fork() optimization (Linux) is almost
>> useless in CPython, because even if you don't modify data directly,
>> refcounts are modified, and PyObjects with refcounts inside are spread
>> all over process' memory (and one small refcount modification causes
>> the whole page - 4kB - to be copied into a child process).
>
> Indeed.
>

There's been a bug report a couple months ago from someone using large
datasets for some scientific application. He was suggesting to add
support for Linux's MADV_MERGEABLE, but the root cause is really the
reference count being incremented even when objects are treated
read-only.
For the record, it's http://bugs.python.org/issue9942 (and this idea
was brought up here).

cf