[pypy-dev] Py_DecRef() in cpyext

Sun Feb 26 11:00:01 CET 2012

Stefan Behnel, 26.02.2012 09:50:
> when I took a look at object.h and saw that the Py_DECREF() macro *always*
> calls into it. Another surprise.
> 
> I had understood in previous discussions that the refcount emulation in
> cpyext only counts C references, which I consider a suitable design. (I
> guess something as common as Py_None uses the obvious optimisation of
> always having a ref-count > 1, right? At least when not debugging...)
> 
> So I changed the macros to use an appropriate C-level implementation:
> 
> """
> #define Py_INCREF(ob)  ((((PyObject *)ob)->ob_refcnt > 0) ? \
>      ((PyObject *)ob)->ob_refcnt++ : (Py_IncRef((PyObject *)ob)))
> 
> #define Py_DECREF(ob)  ((((PyObject *)ob)->ob_refcnt > 1) ? \
>      ((PyObject *)ob)->ob_refcnt-- : (Py_DecRef((PyObject *)ob)))
> 
> #define Py_XINCREF(op) do { if ((op) == NULL) ; else Py_INCREF(op); \
>                           } while (0)
> 
> #define Py_XDECREF(op) do { if ((op) == NULL) ; else Py_DECREF(op); \
>                           } while (0)
> """
> 
> to tell the C compiler that it doesn't actually need to call into PyPy in
> most cases (note that I didn't use any branch prediction macros, but that
> shouldn't change all that much anyway). This shaved off a couple of cycles
> from my iteration benchmark, but much less than I would have liked. My
> intuition tells me that this is because almost all objects that appear in
> the benchmark are actually short-lived in C space so that pretty much every
> Py_DECREF() on them kills them straight away and thus calls into
> Py_DecRef() anyway. To be verified with a better test.

Ok, here's a stupid micro-benchmark for ref-counting:

def bench(x):
    cdef int i
    for i in xrange(10000):
        a = x
        b = x
        c = x
        d = x
        e = x
        f = x
        g = x

Leads to the obvious C code. :) (and yes, this will eventually stop
actually being a benchmark in Cython...)

When always calling into Py_IncRef() and Py_DecRef(), I get this

$ pypy -m timeit -s 'from refcountbench import bench' 'bench(10)'
1000 loops, best of 3: 683 usec per loop

With the macros above, I get this:

$ pypy -m timeit -s 'from refcountbench import bench' 'bench(10)'
1000 loops, best of 3: 385 usec per loop

So that's better by almost a factor of 2, just because the C compiler can
handle most of the ref-counting internally once there is more than one C
reference to an object. It will obviously be a lot less than that for
real-world code, but I think it makes it clear enough that it's worth
putting some effort into ways to avoid calling back and forth across the
border for no good reason.

Stefan