[pypy-dev] cpyext reference counting and other gc's

Mon May 2 20:14:45 CEST 2011

Sure,

I started with a basic concept of having as few dependencies in pypy
rpython code as possible, that is pypy sees object defined/created in
c extension as native.

Granted, I didn't check all the details yet, e.g. cpython protocols
probably need to be wrapped in cpyext anyway. Dunno if possible in
general case, I still need to investigate the hack I'm thinking of
here.

I looked through code for quite a few cpython c extensions, most make
only minimal use of cpython api. Moreover it occurs to me it is
important to support sheer number of simple cpython extensions
(rationale is that python is commonly used to glue large-ish software
modules together). The few extensions that do something non-trivial
would probably need to be patched to take full advantage of pypy
anyway.

The consequence is Py* "symbols" (C source uses those symbolically, we
are free to inject macros or define functions) can become quite
complex functions. If c extension writer assumes that incref is a very
simple operation while repeatedly calling a user-defined function is
complex and overoptimizes their extension with this view, the results
could be funny :P

Please correct me if I misunderstood cpyext, I'd be willing to write
up cpyext on pypy wiki.

Current cpyext creates a shadow object for every pointer that is
passed to or from C and keeps this object for every pointer that is
used in C internally, lifetime seen through object's reference count.

My concept is to shadow only the reference count.

When an object crosses pypy-C boundary, current cpyext has to perform
an allocation or lookup anyway, it is rather similar with my proposal,
perhaps my proposal is even cheaper here. As far as incref/decref is
concerned, only first lookup is potentially slow, subsequent lookups
are in cpu cache anyway, moreover if there's significant traffic to
the reference count dict, whole dict is probably cached.

Another possible advantage is unloading C extension modules as we can
attribute privately stored python objects to the module that holds
them; although as this is not done in cpython, most modules are
probably broken.

Cheers,
Dima Tisnek

p.s.
I'd really like to know what are reasons some c extension do not work
with cpyext right now.
can anyone weigh in on dict/custom data structure lookup, e.g. splay
tree vs pointer-chasing linked shadow objects?
is there a better data structure than a has table (void* -> ssize_t)

On 26 April 2011 09:23, Armin Rigo <arigo at tunes.org> wrote:
> Hi Dima,
>
> On Mon, Apr 25, 2011 at 9:53 PM, Dima Tisnek <dimaqq at gmail.com> wrote:
>> https://docs.google.com/document/d/1k7t-WIsfKW4tIL9i8-7Y6_9lo18wcsibyDONOF2i_l8/edit?hl=en
>
> Can you explain a bit more what are the advantages of the solution you
> propose, compared to what is already implemented in cpyext?  Your
> description is far too high-level for us to know exactly what you
> mean.
>
> It seems that you want to replace the currently implemented solution
> with a very different one.  I can explain it in a bit more details,
> but I would first like to hear what goal you are trying to achieve.
> Here is a quick reply based on guessing.  The issue with your version
> is that Py_INCREF() and Py_DECREF() needs to do a slow dictionary
> lookup, while ours doesn't.  Conversely, I believe that your version
> doesn't need a dictionary lookup in other cases where ours needs to.
> However it seems to me that if you add so much overhead to Py_INCREF()
> and Py_DECREF(), you loose all other speed advantages.
>
>
> A bientôt,
>
> Armin.
>