2011/5/23 "Martin v. Löwis" <martin@v.loewis.de>

> I'm not a compiler/profiling expert so the main question is if such
> design can work, and maybe someone was thinking about something
> similar?

My expectation is that your approach would likely make the issues
worse in a multi-CPU setting. If you put multiple reference counters
into a contiguous block of memory, unrelated reference counters will
live in the same cache line. Consequentially, changing one reference
counter on one CPU will invalidate the cached reference counters of
that cache line on other CPU, making your problem a) actually worse.

Regards,
Martin

I don't think that moving ob_refcnt to a proper memory pool will solve the problem of cache pollution anyway.

ob_refcnt is obviously the most stressed field in PyObject, but it's not the only one. We have , that is needed to model each object (instance) "behavior", which is massively accessed too, so a cache line will be loaded as well when the object will be used.

Also, only a few of simple objects have just ob_refcnt and ob_type. Most of them have other fields too, and accessing them means a line cache load.

Regards,

Cesare

P.S. Memory allocation granularity can help sometimes, leaving some data (ob_refcnt and/or ob_type) on one cache line, and the other on the next one.