
Now that lxml works in PyPy, I've been excited to try Scrapy in PyPy 2.0 I've run into this issue. I'm not sure what could be happening here, but I suspect it could be a twisted+pypy issue. I'm hoping it might look familiar to someone. ERROR: Error caught on signal handler: <bound method LogStats.spider_opened of <scrapy.contrib.logstats.LogStats object at 0x0000000006f3a8e0>> Traceback (most recent call last): File "/usr/local/pypy/site-packages/twisted/internet/defer.py", line 1045, in _inlineCallbacks result = g.send(result) File "/usr/local/pypy/site-packages/scrapy/core/engine.py", line 225, in open_spider yield self.signals.send_catch_log_deferred(signals.spider_opened, spider=spider) File "/usr/local/pypy/site-packages/scrapy/signalmanager.py", line 23, in send_catch_log_deferred return signal.send_catch_log_deferred(*a, **kw) File "/usr/local/pypy/site-packages/scrapy/utils/signal.py", line 53, in send_catch_log_deferred *arguments, **named) --- <exception caught here> --- File "/usr/local/pypy/site-packages/twisted/internet/defer.py", line 134, in maybeDeferred result = f(*args, **kw) File "/usr/local/pypy/site-packages/scrapy/xlib/pydispatch/robustapply.py", line 47, in robustApply return receiver(*arguments, **named) exceptions.TypeError: spider_opened() got 2 unexpected keyword arguments ERROR: Error caught on signal handler: <bound method LogStats.response_received of <scrapy.contrib.logstats.LogStats object at 0x0000000006f3a8e0>> Traceback (most recent call last): File "/usr/local/pypy/site-packages/twisted/internet/defer.py", line 464, in _startRunCallbacks self._runCallbacks() File "/usr/local/pypy/site-packages/twisted/internet/defer.py", line 551, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/local/pypy/site-packages/scrapy/core/engine.py", line 200, in _on_success response=response, request=request, spider=spider) File "/usr/local/pypy/site-packages/scrapy/signalmanager.py", line 19, in send_catch_log return signal.send_catch_log(*a, **kw) --- <exception caught here> --- File "/usr/local/pypy/site-packages/scrapy/utils/signal.py", line 22, in send_catch_log *arguments, **named) File "/usr/local/pypy/site-packages/scrapy/xlib/pydispatch/robustapply.py", line 47, in robustApply return receiver(*arguments, **named) exceptions.TypeError: response_received() got 4 unexpected keyword arguments ERROR: Error caught on signal handler: <bound method CoreStats.response_received of <scrapy.contrib.corestats.CoreStats object at 0x00000000061b8d08>> Traceback (most recent call last): File "/usr/local/pypy/site-packages/twisted/internet/defer.py", line 464, in _startRunCallbacks self._runCallbacks() File "/usr/local/pypy/site-packages/twisted/internet/defer.py", line 551, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/local/pypy/site-packages/scrapy/core/engine.py", line 200, in _on_success response=response, request=request, spider=spider) File "/usr/local/pypy/site-packages/scrapy/signalmanager.py", line 19, in send_catch_log return signal.send_catch_log(*a, **kw) --- <exception caught here> --- File "/usr/local/pypy/site-packages/scrapy/utils/signal.py", line 22, in send_catch_log *arguments, **named) File "/usr/local/pypy/site-packages/scrapy/xlib/pydispatch/robustapply.py", line 47, in robustApply return receiver(*arguments, **named) exceptions.TypeError: response_received() got 4 unexpected keyword arguments Here are the definitions for CoreStats and LogStats: https://github.com/scrapy/scrapy/blob/0.16/scrapy/contrib/logstats.py https://github.com/scrapy/scrapy/blob/0.16/scrapy/contrib/corestats.py Let me know if this is a PyPy bug and I will turn it into a bug report. Thanks -Joe

Hi, 2012/12/3 Joe Hillenbrand <joehillen@gmail.com>
exceptions.TypeError: spider_opened() got 2 unexpected keyword arguments
Could you modify a bit this spider_opened function, so that it accepts a **kwargs? Something like (not tested!): def spider_opened(self, spider, **kwargs): if kwargs: raise TypeError("unexpected keywords", kwargs) ... -- Amaury Forgeot d'Arc

Hi, On Sun, Dec 2, 2012 at 4:09 PM, Joe Hillenbrand <joehillen@gmail.com> wrote:
(...) response=response, request=request, spider=spider) (...) return receiver(*arguments, **named) exceptions.TypeError: response_received() got 4 unexpected keyword arguments
No real clue, but it looks like keyword arguments are passed around as keyword arguments on PyPy, whereas CPython converts them to positional arguments somewhere along the call chain (which is long and goes via deferreds). For us to help more, please provide a step-by-step "how to reproduce" list. It's typically easy if we can reproduce the bug locally, and very hard if not. A bientôt, Armin.

I've narrowed down the problem thanks to Amaury's suggestion. It looks like it is caused by some black magic to figure out what arguments a signal handler can accept. https://github.com/scrapy/scrapy/blob/0.16/scrapy/xlib/pydispatch/robustappl... I haven't completely figured out what it is trying to do. It looks like it is touching innards that PyPy might not have. It doesn't look like it is particularly well written anyway, so it will probably need to be rewritten to be both CPython and PyPy compatible. On Mon, Dec 3, 2012 at 10:01 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi,
On Sun, Dec 2, 2012 at 4:09 PM, Joe Hillenbrand <joehillen@gmail.com> wrote:
(...) response=response, request=request, spider=spider) (...) return receiver(*arguments, **named) exceptions.TypeError: response_received() got 4 unexpected keyword arguments
No real clue, but it looks like keyword arguments are passed around as keyword arguments on PyPy, whereas CPython converts them to positional arguments somewhere along the call chain (which is long and goes via deferreds). For us to help more, please provide a step-by-step "how to reproduce" list. It's typically easy if we can reproduce the bug locally, and very hard if not.
A bientôt,
Armin.

Armin Rigo, 03.12.2012 19:01:
On Sun, Dec 2, 2012 at 4:09 PM, Joe Hillenbrand wrote:
(...) response=response, request=request, spider=spider) (...) return receiver(*arguments, **named) exceptions.TypeError: response_received() got 4 unexpected keyword arguments
No real clue, but it looks like keyword arguments are passed around as keyword arguments on PyPy, whereas CPython converts them to positional arguments somewhere along the call chain (which is long and goes via deferreds).
I noticed that PyPy validates keyword arguments at a different stage than CPython, at least at the cpyext level. PyPy does it at call time, whereas CPython leaves it to the callee. Leads to this difference for the argument unpacking code in Cython implemented functions: https://github.com/cython/cython/blob/1cc172882ff754840f025ce81b2ed3183dfea9... Stefan

I've found a place where PyPy and CPython disagree. https://gist.github.com/4220533 This might not be the only issue, but it's the first thing I've found so far. -Joe

On Wed, Dec 5, 2012 at 3:40 PM, Joe Hillenbrand <joehillen@gmail.com> wrote:
I've found a place where PyPy and CPython disagree.
https://gist.github.com/4220533
This might not be the only issue, but it's the first thing I've found so far.
-Joe
This is a well known issue - PyPy's methods and builtin methods are not that different. We can probably fix it though

I was able to fix the issue with scrapy. https://github.com/joehillen/scrapy/commit/8778af5c5be50a5d746751352f8d710d1... Unfortunately, scrapy takes twice as long in PyPy than in CPython. I suspect this is because lxml is twice as slow in PyPy vs CPython, which I found in lxml's benchmarks. Should lxml be added to the set of speed tests? On Thu, Dec 6, 2012 at 12:34 AM, Maciej Fijalkowski <fijall@gmail.com>wrote:
On Wed, Dec 5, 2012 at 3:40 PM, Joe Hillenbrand <joehillen@gmail.com> wrote:
I've found a place where PyPy and CPython disagree.
https://gist.github.com/4220533
This might not be the only issue, but it's the first thing I've found so far.
-Joe
This is a well known issue - PyPy's methods and builtin methods are not that different. We can probably fix it though

On Wed, Dec 12, 2012 at 7:06 PM, Joe Hillenbrand <joehillen@gmail.com> wrote:
I was able to fix the issue with scrapy.
https://github.com/joehillen/scrapy/commit/8778af5c5be50a5d746751352f8d710d1...
Unfortunately, scrapy takes twice as long in PyPy than in CPython. I suspect this is because lxml is twice as slow in PyPy vs CPython, which I found in lxml's benchmarks.
Should lxml be added to the set of speed tests?
no. lxml uses cpyext (CPython extension compatibility) that is and will forever be slow.

Maciej Fijalkowski, 12.12.2012 20:10:
On Wed, Dec 12, 2012 at 7:06 PM, Joe Hillenbrand wrote:
I was able to fix the issue with scrapy.
https://github.com/joehillen/scrapy/commit/8778af5c5be50a5d746751352f8d710d1...
Unfortunately, scrapy takes twice as long in PyPy than in CPython. I suspect this is because lxml is twice as slow in PyPy vs CPython, which I found in lxml's benchmarks.
Should lxml be added to the set of speed tests?
no. lxml uses cpyext (CPython extension compatibility) that is and will forever be slow.
Well, I don't think it would be hard for any PyPy core developer to make it twice as fast. Shouldn't be more than a day's work. Stefan

Out of curiosity Stefan, if we had an alternate C-API with similar methods (e.g. PyPyList_Append or so), but different signatures and memory model, how hard do you think it would be to have Cython support this? Alex On Wed, Dec 12, 2012 at 11:35 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Maciej Fijalkowski, 12.12.2012 20:10:
On Wed, Dec 12, 2012 at 7:06 PM, Joe Hillenbrand wrote:
I was able to fix the issue with scrapy.
https://github.com/joehillen/scrapy/commit/8778af5c5be50a5d746751352f8d710d1...
Unfortunately, scrapy takes twice as long in PyPy than in CPython. I
suspect
this is because lxml is twice as slow in PyPy vs CPython, which I found in lxml's benchmarks.
Should lxml be added to the set of speed tests?
no. lxml uses cpyext (CPython extension compatibility) that is and will forever be slow.
Well, I don't think it would be hard for any PyPy core developer to make it twice as fast. Shouldn't be more than a day's work.
Stefan
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
-- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero

Alex Gaynor, 13.12.2012 08:43:
Out of curiosity Stefan, if we had an alternate C-API with similar methods (e.g. PyPyList_Append or so), but different signatures and memory model, how hard do you think it would be to have Cython support this?
Impossible to say in that generality. If it's only about exchanging C functions, it should be doable, but if it has an impact on Cython's type system, it might turn into a horrible mess to come up with something that works in both CPython and PyPy. Also note that Cython knows a lot about reference counting internally. If that alternate C-API requires substantial changes to the way references are maintained in the C code, that would mean some work. Also note that the amount of Cython code out there that uses explicit C-API calls for one reason or another is most likely rather large. All in all, I'm not a fan of that one big revolution that will make everything beautiful, fast and shiny, but that will never happen, really. I prefer small steps that make things work. Stefan

On Fri, Dec 14, 2012 at 9:44 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Alex Gaynor, 13.12.2012 08:43:
Out of curiosity Stefan, if we had an alternate C-API with similar methods (e.g. PyPyList_Append or so), but different signatures and memory model, how hard do you think it would be to have Cython support this?
Impossible to say in that generality. If it's only about exchanging C functions, it should be doable, but if it has an impact on Cython's type system, it might turn into a horrible mess to come up with something that works in both CPython and PyPy.
Also note that Cython knows a lot about reference counting internally. If that alternate C-API requires substantial changes to the way references are maintained in the C code, that would mean some work.
Also note that the amount of Cython code out there that uses explicit C-API calls for one reason or another is most likely rather large.
All in all, I'm not a fan of that one big revolution that will make everything beautiful, fast and shiny, but that will never happen, really. I prefer small steps that make things work.
Stefan
I don't want to be a naysayer here, but supporting CPython C API is a mess. I don't think there is a way to make it nice and shiny (no matter what) or a way to make incremental improvements that lead anywhere good. That said, I agree that exposing a different C API is not solving much, while adding burden to maintainers of both Cython and PyPy and I'm generally against the idea. What can be done is keeping refcounts on python objects and then growing few fields for keeping the C stuff forever. I can even think about a scheme that would do it with a bit of a mess. This would require storing an extra field on all objects. I can think about a scheme to have this done only when invoking cpyext for the first time. If we have a special pointer, we can allocate an object in old generation, that's tied to the original object. It has a refcount, with 0 means it goes away. Since it's not movable, you can take a pointer to it and pass it to C. It's also a root, but a special kind of root where during collection refcount == 0 means it dies away. The objects have references to each other, so: * PyPy object keeps the C object alive for the entire lifetime of the pypy object. * C object keeps the PyPy object alive as long as it's refcount is not 0 during collection time. What we get: * Simple refcounting (can be implemented in C as macros even) * Lack of dictionaries What we loose: * We need to implement it (so time) and it requires a little bit of GC complications. Cheers, fijal

Hi Fijal, On Fri, Dec 14, 2012 at 9:04 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
That said, I agree that exposing a different C API is not solving much, while adding burden to maintainers of both Cython and PyPy and I'm generally against the idea.
I tend to agree with this point (so I disagree with what Leonardo said just now, on this front :-). (skipped explanation...)
What we get:
* Simple refcounting (can be implemented in C as macros even)
* Lack of dictionaries
It sounds like a good model to me. If I'm right, refcounting is even simpler than in CPython, because all it needs is to increment or decrement the refcount, but not check for zero. This check is done only during collection. About "Lack of dictionaries", it's unclear: as a first approximation to your idea, we could simply keep a dictionary going from the PyPy object to the C object. (We don't need the opposite direction.) A bientôt, Armin.

On Mon, Dec 17, 2012 at 3:46 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi Fijal,
On Fri, Dec 14, 2012 at 9:04 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
That said, I agree that exposing a different C API is not solving much, while adding burden to maintainers of both Cython and PyPy and I'm generally against the idea.
I tend to agree with this point (so I disagree with what Leonardo said just now, on this front :-).
(skipped explanation...)
What we get:
* Simple refcounting (can be implemented in C as macros even)
* Lack of dictionaries
It sounds like a good model to me. If I'm right, refcounting is even simpler than in CPython, because all it needs is to increment or decrement the refcount, but not check for zero. This check is done only during collection.
About "Lack of dictionaries", it's unclear: as a first approximation to your idea, we could simply keep a dictionary going from the PyPy object to the C object. (We don't need the opposite direction.)
If you have a dictionary, then the pypy object does not keep the cpy object alive (or the dictionary keeps both of them alive or some other stuff). Maybe we can something like what we do now with hashes of objects. Dictionary while in nursery and then add the link if it survives?

Hi Maciej, On Mon, Dec 17, 2012 at 3:46 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
If you have a dictionary, then the pypy object does not keep the cpy object alive (or the dictionary keeps both of them alive or some other stuff).
Obviouly it should be a dictionary with weak keys. The values can be raw addresses anyway, as the CPyExt objects don't move.
Maybe we can something like what we do now with hashes of objects. Dictionary while in nursery and then add the link if it survives?
Yes, maybe, but first we should do it without this optimization. Also note that if a PyPy object was already old when we first asked for its CPyExt object, then we need the dictionary anyway. It's unclear how common the optimization case is: accessing a few times a young object's CPyExt equivalent (but not too often), and then continuing to access it (often) when the young object was made old. A bientôt, Armin.

On Fri, Dec 14, 2012 at 5:44 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
All in all, I'm not a fan of that one big revolution that will make everything beautiful, fast and shiny, but that will never happen, really. I prefer small steps that make things work.
This is the biggest impedance mismatch that I see in this discussion. Pypy is all about revolution, changing completely how the interpreter work to keep a clean language for the user (python). The same should happen in cython, a completely different backend for pypy only keeping the nice cython language. I think there needs to be a fork of cython that don't have any idea of reference counting and uses a mixture of cffi and a clean c-api. But first pypy needs to have the c-api right? -- Leonardo Santagada

On Thu, Dec 13, 2012 at 9:35 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Maciej Fijalkowski, 12.12.2012 20:10:
On Wed, Dec 12, 2012 at 7:06 PM, Joe Hillenbrand wrote:
I was able to fix the issue with scrapy.
https://github.com/joehillen/scrapy/commit/8778af5c5be50a5d746751352f8d710d1...
Unfortunately, scrapy takes twice as long in PyPy than in CPython. I suspect this is because lxml is twice as slow in PyPy vs CPython, which I found in lxml's benchmarks.
Should lxml be added to the set of speed tests?
no. lxml uses cpyext (CPython extension compatibility) that is and will forever be slow.
Well, I don't think it would be hard for any PyPy core developer to make it twice as fast. Shouldn't be more than a day's work.
Stefan
I'm not so sure, we wouldn't know until someone tries it. What optimizations did you have in mind? For what is worth, cpyext is not twice as slow, lxml is. cpyext is likely 10-20x slower or so. I presume lowering the overhead would not automatically make lxml twice as fast, since it's doing quite a lot of other work. Anyway, without trying we don't really know Cheers, fijal

Maciej Fijalkowski, 13.12.2012 09:13:
On Thu, Dec 13, 2012 at 9:35 AM, Stefan Behnel wrote:
Maciej Fijalkowski, 12.12.2012 20:10:
On Wed, Dec 12, 2012 at 7:06 PM, Joe Hillenbrand wrote:
I was able to fix the issue with scrapy.
https://github.com/joehillen/scrapy/commit/8778af5c5be50a5d746751352f8d710d1...
Unfortunately, scrapy takes twice as long in PyPy than in CPython. I suspect this is because lxml is twice as slow in PyPy vs CPython, which I found in lxml's benchmarks.
Should lxml be added to the set of speed tests?
no. lxml uses cpyext (CPython extension compatibility) that is and will forever be slow.
Well, I don't think it would be hard for any PyPy core developer to make it twice as fast. Shouldn't be more than a day's work.
I'm not so sure, we wouldn't know until someone tries it. What optimizations did you have in mind?
Anything that creates a proper fast-path in the ref-counting functions and that generally takes pressure off them, e.g. by keeping PyObjects alive in a weakref dict as long as the corresponding PyPy object lives, so that useless re-allocation cycles are avoided. I'm sure that really simple changes can bring a substantial improvement here.
For what is worth, cpyext is not twice as slow, lxml is. cpyext is likely 10-20x slower or so. I presume lowering the overhead would not automatically make lxml twice as fast, since it's doing quite a lot of other work.
lxml's API performance suffers a lot from object/reference creation and deallocation time, so making object deallocation faster and making it happen only when necessary would certainly improve the overall performance. Stefan

On Thu, Dec 13, 2012 at 7:21 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Maciej Fijalkowski, 13.12.2012 09:13:
On Thu, Dec 13, 2012 at 9:35 AM, Stefan Behnel wrote:
Maciej Fijalkowski, 12.12.2012 20:10:
On Wed, Dec 12, 2012 at 7:06 PM, Joe Hillenbrand wrote:
I was able to fix the issue with scrapy.
https://github.com/joehillen/scrapy/commit/8778af5c5be50a5d746751352f8d710d1...
Unfortunately, scrapy takes twice as long in PyPy than in CPython. I suspect this is because lxml is twice as slow in PyPy vs CPython, which I found in lxml's benchmarks.
Should lxml be added to the set of speed tests?
no. lxml uses cpyext (CPython extension compatibility) that is and will forever be slow.
Well, I don't think it would be hard for any PyPy core developer to make it twice as fast. Shouldn't be more than a day's work.
I'm not so sure, we wouldn't know until someone tries it. What optimizations did you have in mind?
Anything that creates a proper fast-path in the ref-counting functions and that generally takes pressure off them, e.g. by keeping PyObjects alive in a weakref dict as long as the corresponding PyPy object lives, so that useless re-allocation cycles are avoided. I'm sure that really simple changes can bring a substantial improvement here.
short term allocations are usually very cheap. dictionaries lookups not necesarilly so. Do you have any specific optimizations in mind? I don't see any easy way of doing it all.
For what is worth, cpyext is not twice as slow, lxml is. cpyext is likely 10-20x slower or so. I presume lowering the overhead would not automatically make lxml twice as fast, since it's doing quite a lot of other work.
lxml's API performance suffers a lot from object/reference creation and deallocation time, so making object deallocation faster and making it happen only when necessary would certainly improve the overall performance.
Stefan
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev

Maciej Fijalkowski, 13.12.2012 19:23:
On Thu, Dec 13, 2012 at 7:21 PM, Stefan Behnel wrote:
Maciej Fijalkowski, 13.12.2012 09:13:
On Thu, Dec 13, 2012 at 9:35 AM, Stefan Behnel wrote:
Maciej Fijalkowski, 12.12.2012 20:10:
On Wed, Dec 12, 2012 at 7:06 PM, Joe Hillenbrand wrote:
I was able to fix the issue with scrapy.
https://github.com/joehillen/scrapy/commit/8778af5c5be50a5d746751352f8d710d1...
Unfortunately, scrapy takes twice as long in PyPy than in CPython. I suspect this is because lxml is twice as slow in PyPy vs CPython, which I found in lxml's benchmarks.
Should lxml be added to the set of speed tests?
no. lxml uses cpyext (CPython extension compatibility) that is and will forever be slow.
Well, I don't think it would be hard for any PyPy core developer to make it twice as fast. Shouldn't be more than a day's work.
I'm not so sure, we wouldn't know until someone tries it. What optimizations did you have in mind?
Anything that creates a proper fast-path in the ref-counting functions and that generally takes pressure off them, e.g. by keeping PyObjects alive in a weakref dict as long as the corresponding PyPy object lives, so that useless re-allocation cycles are avoided. I'm sure that really simple changes can bring a substantial improvement here.
short term allocations are usually very cheap.
In the profile I posted the last time we discussed this, I think it was pretty clear that most of the time is not currently being spent in the allocation but in the deallocation. More than 50% of the overall runtime in this case: http://cython.org/callgrind-pypy-nbody.png Plus, connecting the lifetime of PyObjects to that of PyPy objects would fix the problem that PyObjects can die prematurely and take C state with them: http://docs.cython.org/src/userguide/pypy.html My intuition was to add a fastpath to Py_DECREF() that would do (close to) nothing if the PyPy object is still alive. Either that, or move this whole decision into C by somehow increasing the C level refcount during the lifetime of the PyPy object and decreasing it when the PyPy object dies. The latter approach (if doable) is obviously preferable from a C point of view because it would improve the hit-count of the "common case" tests in the INCREF/DECREF C macros, thus avoiding unnecessary calls into PyPy all together by using inlined code. That would give it about the same speed as in CPython for objects that are being reused in C code more than once for which a PyPy object reference exists (certainly not an unusual case). Stefan

2012/12/13 Stefan Behnel <stefan_ml@behnel.de>
My intuition was to add a fastpath to Py_DECREF() that would do (close to) nothing if the PyPy object is still alive. Either that, or move this whole decision into C by somehow increasing the C level refcount during the lifetime of the PyPy object and decreasing it when the PyPy object dies.
It may be difficult, because most standard types don't have a __del__, and I'm not sure we can attach a weak reference.
The latter approach (if doable) is obviously preferable from a C point of view because it would improve the hit-count of the "common case" tests in the INCREF/DECREF C macros, thus avoiding unnecessary calls into PyPy all together by using inlined code. That would give it about the same speed as in CPython for objects that are being reused in C code more than once for which a PyPy object reference exists (certainly not an unusual case).
-- Amaury Forgeot d'Arc

On Thu, Dec 13, 2012 at 10:47 PM, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:
2012/12/13 Stefan Behnel <stefan_ml@behnel.de>
My intuition was to add a fastpath to Py_DECREF() that would do (close to) nothing if the PyPy object is still alive. Either that, or move this whole decision into C by somehow increasing the C level refcount during the lifetime of the PyPy object and decreasing it when the PyPy object dies.
It may be difficult, because most standard types don't have a __del__, and I'm not sure we can attach a weak reference.
having tons of weakrefs is also a bad idea (or tons of objects with __del__s)
The latter approach (if doable) is obviously preferable from a C point of view because it would improve the hit-count of the "common case" tests in the INCREF/DECREF C macros, thus avoiding unnecessary calls into PyPy all together by using inlined code. That would give it about the same speed as in CPython for objects that are being reused in C code more than once for which a PyPy object reference exists (certainly not an unusual case).
-- Amaury Forgeot d'Arc
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev

Maciej Fijalkowski, 13.12.2012 22:21:
On Thu, Dec 13, 2012 at 10:47 PM, Amaury Forgeot d'Arc wrote:
2012/12/13 Stefan Behnel
My intuition was to add a fastpath to Py_DECREF() that would do (close to) nothing if the PyPy object is still alive. Either that, or move this whole decision into C by somehow increasing the C level refcount during the lifetime of the PyPy object and decreasing it when the PyPy object dies.
It may be difficult, because most standard types don't have a __del__, and I'm not sure we can attach a weak reference.
If you can't attach a weakref to an object, that sounds more like a bug to me, especially in PyPy.
having tons of weakrefs is also a bad idea (or tons of objects with __del__s)
So, I take it that it's a tradeoff - not unusual for optimisation. Why not just give it a try? Stefan

2012/12/14 Stefan Behnel <stefan_ml@behnel.de>
Maciej Fijalkowski, 13.12.2012 22:21:
On Thu, Dec 13, 2012 at 10:47 PM, Amaury Forgeot d'Arc wrote:
2012/12/13 Stefan Behnel
My intuition was to add a fastpath to Py_DECREF() that would do (close
to)
nothing if the PyPy object is still alive. Either that, or move this whole decision into C by somehow increasing the C level refcount during the lifetime of the PyPy object and decreasing it when the PyPy object dies.
It may be difficult, because most standard types don't have a __del__, and I'm not sure we can attach a weak reference.
If you can't attach a weakref to an object, that sounds more like a bug to me, especially in PyPy.
Try this on any Python:
weakref.ref(3)
Supporting weak references is not cheap, especially for short lived objects. I'm sure you don't want to slow down all strings and numbers that are *not* passed to extension modules.
having tons of weakrefs is also a bad idea (or tons of objects with __del__s)
So, I take it that it's a tradeoff - not unusual for optimisation. Why not just give it a try?
-- Amaury Forgeot d'Arc
participants (7)
-
Alex Gaynor
-
Amaury Forgeot d'Arc
-
Armin Rigo
-
Joe Hillenbrand
-
Leonardo Santagada
-
Maciej Fijalkowski
-
Stefan Behnel