On 14/12/2021 19.19, Eric Snow wrote:
A while back I concluded that neither approach would work for us. The approach I had taken would have significant cache performance penalties in a per-interpreter GIL world. The approach that modifies Py_INCREF() has a significant performance penalty due to the extra branch on such a frequent operation.
Would it be possible to write the Py_INCREF() and Py_DECREF() macros in a way that does not depend on branching? For example we could use the highest bit of the ref count as an immutable indicator and do something like ob_refcnt += !(ob_refcnt >> 63) instead of ob_refcnt++ The code performs "ob_refcnt += 1" when the highest bit is not set and "ob_refcnt += 1" when the bit is set. I have neither tested if the approach actually works nor it's performance. Christian