On Wed, Dec 15, 2021 at 2:21 AM Antoine Pitrou <antoine@python.org> wrote:
On Wed, 15 Dec 2021 10:42:17 +0100 Christian Heimes <christian@python.org> wrote:
On 14/12/2021 19.19, Eric Snow wrote:
A while back I concluded that neither approach would work for us. The approach I had taken would have significant cache performance penalties in a per-interpreter GIL world. The approach that modifies Py_INCREF() has a significant performance penalty due to the extra branch on such a frequent operation.
Would it be possible to write the Py_INCREF() and Py_DECREF() macros in a way that does not depend on branching? For example we could use the highest bit of the ref count as an immutable indicator and do something like
ob_refcnt += !(ob_refcnt >> 63)
instead of
ob_refcnt++
Probably, but that would also issue spurious writes to immortal refcounts from different threads at once, so might end up worse performance-wise.
Unless the CPU is clever enough to skip claiming the cacheline in exclusive-mode for a "+= 0". Which I guess is something you'd have to check empirically on every microarch and instruction pattern you care about, because there's no way it's documented. But maybe? CPUs are very smart, except when they aren't. -n -- Nathaniel J. Smith -- https://vorpus.org