Mailman 3 cpyext reference counting and other gc's - pypy-dev

cpyext reference counting and other gc's

Dima Tisnek

March 26, 2011

4:53 a.m.

Hey, I had a look at cpyext recently and saw that reference counting is emulated with, err, reference counting, seeminlgy in the referenced object itself. Does this mean that cpyext would not work with other gc's or is there some wrapping going on behind the scenes?

Show replies by date

Amaury Forgeot d'Arc

March 2011

8:33 a.m.

Hi, 2011/3/26 Dima Tisnek <dimaqq@gmail.com>:

...

Hey, I had a look at cpyext recently and saw that reference counting is emulated with, err, reference counting, seeminlgy in the referenced object itself.

Does this mean that cpyext would not work with other gc's or is there some wrapping going on behind the scenes?

Cpyext works with all pypy gc's. The PyObject* exposed to C code is actually a proxy to the "real" interpreter object; a dict lookup is necessary each time a reference crosses the C/pypy boundary. Yes, this is slow. This is implemented in pypy/module/cpyext/pyobject.py; the main functions are create_ref() and from_ref(). -- Amaury Forgeot d'Arc

Dima Tisnek

5:03 p.m.

I have an alternative idea in mind I'll write up a doc, stick it on github and share with you guys in a couple of days thanks for a clear answer, I just couldn't figure that out form code easily :P d. On 26 March 2011 01:33, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:

...

Hi,

2011/3/26 Dima Tisnek <dimaqq@gmail.com>:

...
Hey, I had a look at cpyext recently and saw that reference counting is emulated with, err, reference counting, seeminlgy in the referenced object itself.

Does this mean that cpyext would not work with other gc's or is there some wrapping going on behind the scenes?

Cpyext works with all pypy gc's. The PyObject* exposed to C code is actually a proxy to the "real" interpreter object; a dict lookup is necessary each time a reference crosses the C/pypy boundary. Yes, this is slow.

This is implemented in pypy/module/cpyext/pyobject.py; the main functions are create_ref() and from_ref().

-- Amaury Forgeot d'Arc

Dima Tisnek

April 2011

7:53 p.m.

Apologies that it took a little long Here is the doc describing the idea and its side effects https://docs.google.com/document/d/1k7t-WIsfKW4tIL9i8-7Y6_9lo18wcsibyDONOF2i... Since many of you are now busy with release, I'm only asking for quick feedback, especially if I missed something obvious. Thanks, Dima Tisnek On 26 March 2011 10:03, Dima Tisnek <dimaqq@gmail.com> wrote:

...

I have an alternative idea in mind

I'll write up a doc, stick it on github and share with you guys in a couple of days

thanks for a clear answer, I just couldn't figure that out form code easily :P

d.

On 26 March 2011 01:33, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:

...
Hi,

2011/3/26 Dima Tisnek <dimaqq@gmail.com>:

...
Hey, I had a look at cpyext recently and saw that reference counting is emulated with, err, reference counting, seeminlgy in the referenced object itself.

Does this mean that cpyext would not work with other gc's or is there some wrapping going on behind the scenes?

Cpyext works with all pypy gc's. The PyObject* exposed to C code is actually a proxy to the "real" interpreter object; a dict lookup is necessary each time a reference crosses the C/pypy boundary. Yes, this is slow.

This is implemented in pypy/module/cpyext/pyobject.py; the main functions are create_ref() and from_ref().

-- Amaury Forgeot d'Arc

Armin Rigo

4:23 p.m.

Hi Dima, On Mon, Apr 25, 2011 at 9:53 PM, Dima Tisnek <dimaqq@gmail.com> wrote:

...

https://docs.google.com/document/d/1k7t-WIsfKW4tIL9i8-7Y6_9lo18wcsibyDONOF2i...

Can you explain a bit more what are the advantages of the solution you propose, compared to what is already implemented in cpyext? Your description is far too high-level for us to know exactly what you mean. It seems that you want to replace the currently implemented solution with a very different one. I can explain it in a bit more details, but I would first like to hear what goal you are trying to achieve. Here is a quick reply based on guessing. The issue with your version is that Py_INCREF() and Py_DECREF() needs to do a slow dictionary lookup, while ours doesn't. Conversely, I believe that your version doesn't need a dictionary lookup in other cases where ours needs to. However it seems to me that if you add so much overhead to Py_INCREF() and Py_DECREF(), you loose all other speed advantages. A bientôt, Armin.

Dima Tisnek

May 2011

6:14 p.m.

Sure, I started with a basic concept of having as few dependencies in pypy rpython code as possible, that is pypy sees object defined/created in c extension as native. Granted, I didn't check all the details yet, e.g. cpython protocols probably need to be wrapped in cpyext anyway. Dunno if possible in general case, I still need to investigate the hack I'm thinking of here. I looked through code for quite a few cpython c extensions, most make only minimal use of cpython api. Moreover it occurs to me it is important to support sheer number of simple cpython extensions (rationale is that python is commonly used to glue large-ish software modules together). The few extensions that do something non-trivial would probably need to be patched to take full advantage of pypy anyway. The consequence is Py* "symbols" (C source uses those symbolically, we are free to inject macros or define functions) can become quite complex functions. If c extension writer assumes that incref is a very simple operation while repeatedly calling a user-defined function is complex and overoptimizes their extension with this view, the results could be funny :P Please correct me if I misunderstood cpyext, I'd be willing to write up cpyext on pypy wiki. Current cpyext creates a shadow object for every pointer that is passed to or from C and keeps this object for every pointer that is used in C internally, lifetime seen through object's reference count. My concept is to shadow only the reference count. When an object crosses pypy-C boundary, current cpyext has to perform an allocation or lookup anyway, it is rather similar with my proposal, perhaps my proposal is even cheaper here. As far as incref/decref is concerned, only first lookup is potentially slow, subsequent lookups are in cpu cache anyway, moreover if there's significant traffic to the reference count dict, whole dict is probably cached. Another possible advantage is unloading C extension modules as we can attribute privately stored python objects to the module that holds them; although as this is not done in cpython, most modules are probably broken. Cheers, Dima Tisnek p.s. I'd really like to know what are reasons some c extension do not work with cpyext right now. can anyone weigh in on dict/custom data structure lookup, e.g. splay tree vs pointer-chasing linked shadow objects? is there a better data structure than a has table (void* -> ssize_t) On 26 April 2011 09:23, Armin Rigo <arigo@tunes.org> wrote:

...

Hi Dima,

On Mon, Apr 25, 2011 at 9:53 PM, Dima Tisnek <dimaqq@gmail.com> wrote:

...
https://docs.google.com/document/d/1k7t-WIsfKW4tIL9i8-7Y6_9lo18wcsibyDONOF2i...

Can you explain a bit more what are the advantages of the solution you propose, compared to what is already implemented in cpyext? Your description is far too high-level for us to know exactly what you mean.

It seems that you want to replace the currently implemented solution with a very different one. I can explain it in a bit more details, but I would first like to hear what goal you are trying to achieve. Here is a quick reply based on guessing. The issue with your version is that Py_INCREF() and Py_DECREF() needs to do a slow dictionary lookup, while ours doesn't. Conversely, I believe that your version doesn't need a dictionary lookup in other cases where ours needs to. However it seems to me that if you add so much overhead to Py_INCREF() and Py_DECREF(), you loose all other speed advantages.

A bientôt,

Armin.

Amaury Forgeot d'Arc

6:44 p.m.

Hi, 2011/5/2 Dima Tisnek <dimaqq@gmail.com>:

...

I'd really like to know what are reasons some c extension do not work with cpyext right now.

I can see three kinds of reasons: - Reference count mistakes, that normally don't show up in CPython (but many c extensions do crash when you delete them from sys.modules, then reimport them; cpyext just detect the failure the first time :-) - unsupported functions, either because we never encountered them, or because they are really hard to support in pypy - extensions that play too much with CPython inter nals: numpy, Cython are in this case

...

can anyone weigh in on dict/custom data structure lookup, e.g. splay tree vs pointer-chasing linked shadow objects? is there a better data structure than a has table (void* -> ssize_t)

Probably! The hash table is used because it's readily available in RPython. But if you care to provide a RPython implementation of the associative container, I'd be happy to test it. -- Amaury Forgeot d'Arc

Dima Tisnek

6:53 p.m.

Hi Amaury, thanks for a quick reply, btw, which api functions are hard to support in pypy or why some are? d. On 2 May 2011 11:44, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:

...

Hi,

2011/5/2 Dima Tisnek <dimaqq@gmail.com>:

...
I'd really like to know what are reasons some c extension do not work with cpyext right now.

I can see three kinds of reasons:

- Reference count mistakes, that normally don't show up in CPython (but many c extensions do crash when you delete them from sys.modules, then reimport them; cpyext just detect the failure the first time :-)

- unsupported functions, either because we never encountered them, or because they are really hard to support in pypy

- extensions that play too much with CPython inter nals: numpy, Cython are in this case

...
can anyone weigh in on dict/custom data structure lookup, e.g. splay tree vs pointer-chasing linked shadow objects? is there a better data structure than a has table (void* -> ssize_t)

Probably! The hash table is used because it's readily available in RPython. But if you care to provide a RPython implementation of the associative container, I'd be happy to test it.

-- Amaury Forgeot d'Arc

Amaury Forgeot d'Arc

7:09 p.m.

2011/5/2 Dima Tisnek <dimaqq@gmail.com>:

...

Hi Amaury, thanks for a quick reply, btw, which api functions are hard to support in pypy or why some are?

Fortunately, there are not so many of them: - PyFile_AsFile(), PyFile_FromFile, because files opened by pypy don't use a FILE* (like python3) - PyThreadState creation and suppression - PyInterpreter creation and suppression - Py_Initialize and Py_Finalize (to embed a python interpreter inside an application) - Some trace and Traceback management functions that are not even documented (but used by Cython :-)) -- Amaury Forgeot d'Arc

Roger Binns

9:21 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...

- Some trace and Traceback management functions that are not even documented (but used by Cython :-))

I use them too. One of the problems with the standard documented C api is that tracebacks do not include any methods implemented in C. For example if a Python method calls a C method which calls a Python method which errors then the traceback won't include the C code. This is very confusing if you don't know why the C code called a Python method. I show this in the doc for my project: http://apidoc.apsw.googlecode.com/hg/exceptions.html#augmented-stack-traces As the doc shows I end up adding in synthetic stack frames so you can clearly see the C code. I also augment the synthetic frames with local variables that can be introspected to find out what is going on. The very bottom of the page shows what a difference that makes. You can see the C code to do it here: http://code.google.com/p/apsw/source/browse/src/traceback.c Note use of functions like PyThreadState_Get() and PyTraceback_here. As long as the signature of AddTraceBackHere can remain the same then I don't care what the body inside is for pypy. Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk2/IGYACgkQmOOfHg372QRzAgCglUdQlOoF3fQBJAAQ7Lf8vRs2 4OMAn1weWlNDNtHHcccB1vjxSvn7wk59 =jTAS -----END PGP SIGNATURE-----

Amaury Forgeot d'Arc

10:08 p.m.

Hi, 2011/5/2 Roger Binns <rogerb@rogerbinns.com>:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

...
- Some trace and Traceback management functions that are not even documented (but used by Cython :-))

I use them too. One of the problems with the standard documented C api is that tracebacks do not include any methods implemented in C. For example if a Python method calls a C method which calls a Python method which errors then the traceback won't include the C code. This is very confusing if you don't know why the C code called a Python method.

I show this in the doc for my project:

http://apidoc.apsw.googlecode.com/hg/exceptions.html#augmented-stack-traces

As the doc shows I end up adding in synthetic stack frames so you can clearly see the C code. I also augment the synthetic frames with local variables that can be introspected to find out what is going on. The very bottom of the page shows what a difference that makes.

You can see the C code to do it here:

http://code.google.com/p/apsw/source/browse/src/traceback.c

Note use of functions like PyThreadState_Get() and PyTraceback_here. As long as the signature of AddTraceBackHere can remain the same then I don't care what the body inside is for pypy.

Yes, we've implemented PyTraceback_here so that it works exactly for this usage. Can you check whether pypy does the right thing for you as well? -- Amaury Forgeot d'Arc

Roger Binns

11:41 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/02/2011 03:08 PM, Amaury Forgeot d'Arc wrote:

...

Yes, we've implemented PyTraceback_here so that it works exactly for this usage. Can you check whether pypy does the right thing for you as well?

I will once I can get anything actually working. Currently having problems with pypy crashing. (The code does compile though.) Note that I am perfectly happy to change my code to something that works with pypy - I already had to make it work the different ways that Python 2 and 3 do. Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk2/QQ4ACgkQmOOfHg372QRujACeLLylS1qh8jBCc3lmqNmGHvSn EHEAoI+3P8B8iuCIsmkhfl/0Bn1qdggc =8j1u -----END PGP SIGNATURE-----

exarkun＠twistedmatrix.com

10:51 p.m.

On 07:09 pm, amauryfa@gmail.com wrote:

...

2011/5/2 Dima Tisnek <dimaqq@gmail.com>:

...
Hi Amaury, thanks for a quick reply, btw, which api functions are hard to support in pypy or why some are?

Fortunately, there are not so many of them: - PyFile_AsFile(), PyFile_FromFile, because files opened by pypy don't use a FILE* (like python3)

Do fdopen(3) and fileno(3) not help here? I can understand how there might be synchronization issues if it were implemented this way... I've never used these CPython APIs myself. You didn't have the memoryview APIs on your list. Does that mean they're easy? Then can you implement them? ;) Jean-Paul

Roger Binns

5:04 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 05/02/2011 03:51 PM, exarkun@twistedmatrix.com wrote:

...

On 07:09 pm, amauryfa@gmail.com wrote:

...
Fortunately, there are not so many of them: - PyFile_AsFile(), PyFile_FromFile, because files opened by pypy don't use a FILE* (like python3)

Do fdopen(3) and fileno(3) not help here?

That gets you the OS handle but the FILE* has a whole bunch of other crud inside like a buffer, current offsets and eol handling. If it is just a one time transfer from Python to C or vice versa then it could work, but if both use the file/FILE concurrently then chances are you'll end up with file corruption. Roger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk2/jOEACgkQmOOfHg372QTLWACdHs8UC5UBeby+kONqFV9Q8Jar sKAAn1EB8cXK8wZYWjOEnInTAYcSSXIQ =9/ZX -----END PGP SIGNATURE-----

5031

Age (days ago)

5069

Last active (days ago)

List overview

Download

13 comments

5 participants

participants (5)

Amaury Forgeot d'Arc
Armin Rigo
Dima Tisnek
exarkun＠twistedmatrix.com
Roger Binns

cpyext reference counting and other gc's

Roger Binns

Roger Binns

Roger Binns

tags

participants (5)