[pypy-dev] Py_DecRef() in cpyext

Stefan Behnel stefan_ml at behnel.de
Sun Feb 26 11:00:01 CET 2012

Stefan Behnel, 26.02.2012 09:50:
> when I took a look at object.h and saw that the Py_DECREF() macro *always*
> calls into it. Another surprise.
> I had understood in previous discussions that the refcount emulation in
> cpyext only counts C references, which I consider a suitable design. (I
> guess something as common as Py_None uses the obvious optimisation of
> always having a ref-count > 1, right? At least when not debugging...)
> So I changed the macros to use an appropriate C-level implementation:
> """
> #define Py_INCREF(ob)  ((((PyObject *)ob)->ob_refcnt > 0) ? \
>      ((PyObject *)ob)->ob_refcnt++ : (Py_IncRef((PyObject *)ob)))
> #define Py_DECREF(ob)  ((((PyObject *)ob)->ob_refcnt > 1) ? \
>      ((PyObject *)ob)->ob_refcnt-- : (Py_DecRef((PyObject *)ob)))
> #define Py_XINCREF(op) do { if ((op) == NULL) ; else Py_INCREF(op); \
>                           } while (0)
> #define Py_XDECREF(op) do { if ((op) == NULL) ; else Py_DECREF(op); \
>                           } while (0)
> """
> to tell the C compiler that it doesn't actually need to call into PyPy in
> most cases (note that I didn't use any branch prediction macros, but that
> shouldn't change all that much anyway). This shaved off a couple of cycles
> from my iteration benchmark, but much less than I would have liked. My
> intuition tells me that this is because almost all objects that appear in
> the benchmark are actually short-lived in C space so that pretty much every
> Py_DECREF() on them kills them straight away and thus calls into
> Py_DecRef() anyway. To be verified with a better test.

Ok, here's a stupid micro-benchmark for ref-counting:

def bench(x):
    cdef int i
    for i in xrange(10000):
        a = x
        b = x
        c = x
        d = x
        e = x
        f = x
        g = x

Leads to the obvious C code. :) (and yes, this will eventually stop
actually being a benchmark in Cython...)

When always calling into Py_IncRef() and Py_DecRef(), I get this

$ pypy -m timeit -s 'from refcountbench import bench' 'bench(10)'
1000 loops, best of 3: 683 usec per loop

With the macros above, I get this:

$ pypy -m timeit -s 'from refcountbench import bench' 'bench(10)'
1000 loops, best of 3: 385 usec per loop

So that's better by almost a factor of 2, just because the C compiler can
handle most of the ref-counting internally once there is more than one C
reference to an object. It will obviously be a lot less than that for
real-world code, but I think it makes it clear enough that it's worth
putting some effort into ways to avoid calling back and forth across the
border for no good reason.


More information about the pypy-dev mailing list