Another reference count leak: ticket #848
The attached patch fixes another reference count leak in the use of
PyArray_DescrFromType.
Could I ask that both this patch and my earlier one (ticket #843) be
applied to subversion. Thank you.
Definitely not enjoying this low level code.
commit 80e1aca1725dd4cd8e091126cf515c39ac3a33ff
Author: Michael Abbott
On Tue, Jul 8, 2008 at 3:35 AM, Michael Abbott
The attached patch fixes another reference count leak in the use of PyArray_DescrFromType.
Could I ask that both this patch and my earlier one (ticket #843) be applied to subversion. Thank you.
I'll take a look at them.
Definitely not enjoying this low level code.
It can leave one a bit boggled, no? Chuck
Michael Abbott wrote:
The attached patch fixes another reference count leak in the use of PyArray_DescrFromType.
The first part of this patch is good. The second is not needed. Also, it would be useful if you could write a test case that shows what is leaking and how you determined that it is leaking.
Could I ask that both this patch and my earlier one (ticket #843) be applied to subversion. Thank you.
Definitely not enjoying this low level code.
What doesn't kill you makes you stronger :-) But, you are correct that reference counting is a bear. -Travis
Hi Travis,
On Tue, Jul 8, 2008 at 11:26 AM, Travis E. Oliphant
Michael Abbott wrote:
The attached patch fixes another reference count leak in the use of PyArray_DescrFromType.
The first part of this patch is good. The second is not needed. Also, it would be useful if you could write a test case that shows what is leaking and how you determined that it is leaking.
Could I ask that both this patch and my earlier one (ticket #843) be applied to subversion. Thank you.
Definitely not enjoying this low level code.
What doesn't kill you makes you stronger :-) But, you are correct that reference counting is a bear.
Could you backport your fixes to 1.1.x also? Chuck
On Tue, 8 Jul 2008, Travis E. Oliphant wrote:
Michael Abbott wrote:
The attached patch fixes another reference count leak in the use of PyArray_DescrFromType.
The first part of this patch is good. The second is not needed. I don't see that. The second part of the patch addresses the case of an early return: this means that the DECREF that occurs later on in the code is bypassed, and so a reference leak will still occur if this early return case occurs. Don't forget that PyArray_DescrFromType returns an incremented reference that has to be decremented, returned or explicitly assigned -- the DECREF obligation has to be met somewhere.
Also, it would be useful if you could write a test case that shows what is leaking and how you determined that it is leaking.
Roughly r = range(n) i = 0 refs = 0 refs = sys.gettotalrefcount() for i in r: float32() print refs - sys.gettotalrefcount() in debug mode python. This isn't quite the whole story (reference counts can be annoyingly fluid), but that's the most of it. In trunk this leaks 2 refs per n, with the attached patch there remains one leak I haven't chased down yet. Is there a framework for writing test cases? I'm constructing tests just to pin down leaks that I find in my application (uses numpy and ctypes extensively), so they're terribly ad-hoc at the moment. I'm chasing down leaks in a leaky application (it ate 1G over a weekend), and numpy is just one source of leaks -- I suspec that the two reference count leaks I've identified so far don't actually leak significant memory, so alas my search continues...
Definitely not enjoying this low level code. What doesn't kill you makes you stronger :-) Heh.
But, you are correct that reference counting is a bear. Actually, it's not the reference counting I'm grumbling about -- the rules for reference counting are, on the whole, pretty straightforward.
On Tue, Jul 8, 2008 at 16:23, Michael Abbott
On Tue, 8 Jul 2008, Travis E. Oliphant wrote:
Michael Abbott wrote:
The attached patch fixes another reference count leak in the use of PyArray_DescrFromType.
The first part of this patch is good. The second is not needed. I don't see that. The second part of the patch addresses the case of an early return: this means that the DECREF that occurs later on in the code is bypassed, and so a reference leak will still occur if this early return case occurs. Don't forget that PyArray_DescrFromType returns an incremented reference that has to be decremented, returned or explicitly assigned -- the DECREF obligation has to be met somewhere.
Also, it would be useful if you could write a test case that shows what is leaking and how you determined that it is leaking.
Roughly
r = range(n) i = 0 refs = 0 refs = sys.gettotalrefcount() for i in r: float32() print refs - sys.gettotalrefcount()
in debug mode python. This isn't quite the whole story (reference counts can be annoyingly fluid), but that's the most of it. In trunk this leaks 2 refs per n, with the attached patch there remains one leak I haven't chased down yet.
Is there a framework for writing test cases? I'm constructing tests just to pin down leaks that I find in my application (uses numpy and ctypes extensively), so they're terribly ad-hoc at the moment.
If you can measure the leak in-process with sys.getrefcount() and friends on a standard non-debug build of Python, then it might be useful for you to write a unit test for us. For example, in numpy/core/tests/test_regression.py, you can see several tests involving reference counts. If the leak isn't particularly measurable without resorting to top(1), then don't bother trying to make a unit test, but a small, complete example that demonstrates the leak is very useful for the rest of us to see if the problem exists on our systems and so we can try our hands at fixing it, too. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Tue, Jul 8, 2008 at 3:23 PM, Michael Abbott
On Tue, 8 Jul 2008, Travis E. Oliphant wrote:
Michael Abbott wrote:
The attached patch fixes another reference count leak in the use of PyArray_DescrFromType.
The first part of this patch is good. The second is not needed. I don't see that. The second part of the patch addresses the case of an early return: this means that the DECREF that occurs later on in the code is bypassed, and so a reference leak will still occur if this early return case occurs. Don't forget that PyArray_DescrFromType returns an incremented reference that has to be decremented, returned or explicitly assigned -- the DECREF obligation has to be met somewhere.
Some function calls do the DECREF on an error return. I haven't looked, but that might be the case here. Chuck
Michael Abbott wrote:
On Tue, 8 Jul 2008, Travis E. Oliphant wrote:
Michael Abbott wrote:
The attached patch fixes another reference count leak in the use of PyArray_DescrFromType.
The first part of this patch is good. The second is not needed.
I don't see that. The second part of the patch addresses the case of an early return: this means that the DECREF that occurs later on in the code is bypassed, and so a reference leak will still occur if this early return case occurs. Don't forget that PyArray_DescrFromType returns an incremented reference that has to be decremented, returned or explicitly assigned -- the DECREF obligation has to be met somewhere.
Don't forget that PyArray_FromAny consumes the reference even if it returns with an error. -Travis
There are three separate patches in this message plus some remarks on "stealing" reference counts at the bottom. On Tue, 8 Jul 2008, Travis E. Oliphant wrote:
Michael Abbott wrote:
On Tue, 8 Jul 2008, Travis E. Oliphant wrote:
The first part of this patch is good. The second is not needed. I don't see that. Don't forget that PyArray_FromAny consumes the reference even if it returns with an error.
Oh dear. That's not good.
Well then, I need to redo my patch. Here's the new patch for
..._arrtype_new:
commit 431d99f40ca200201ba59c74a88b0bd972022ff0
Author: Michael Abbott
On Wed, 9 Jul 2008, Michael Abbott wrote:
Well then, I need to redo my patch. Here's the new patch for ..._arrtype_new:
I'm sorry about this, I posted too early. Here is the final patch (and
I'll update the ticket accordingly).
commit a1ff570cbd3ca6c28f87c55cebf2675b395c6fa0
Author: Michael Abbott
I really don't think that this design of reference count handling in PyArray_FromAny (and consequently PyArray_CheckFromAny) is a good idea. Unfortunately these seem to be part of the published API, so presumably it's too late to change this? (Otherwise I might see how the corresponding patch comes out.)
Changing it would break almost every code using numpy C API ... Don't forget that numpy did build on numarray and numeric, with an almost backward compatible C API. cheers David
On Wed, Jul 9, 2008 at 2:36 AM, Michael Abbott
There are three separate patches in this message plus some remarks on "stealing" reference counts at the bottom.
<snip>
I really don't think that this design of reference count handling in PyArray_FromAny (and consequently PyArray_CheckFromAny) is a good idea. Unfortunately these seem to be part of the published API, so presumably it's too late to change this? (Otherwise I might see how the corresponding patch comes out.)
There was a previous discussion along those lines initiated by, ahem, myself. But changing things at this point would be too much churn. Better, I think, to get these things documented and the code cleaned up. Chuck
Looking at the uses of PyArray_FromAny I can see the motivation for this design: core/include/numpy/ndarrayobject.h has a lot of calls which take a value returned by PyArray_DescrFromType as argument. This has prompted me to take a trawl through the code to see what else is going on, and I note a couple more issues with patches below.
The core issue is that NumPy grew out of Numeric. In Numeric PyArray_Descr was just a C-structure, but in NumPy it is now a real Python object with reference counting. Trying to have a compatible C-API to the old one and making the transition with out huge changes to the C-API is what led to the "stealing" strategy. I did not just out the blue decide to do it that way. Yes, it is a bit of a pain, and yes, it isn't the way it is always done, but as you point out there are precedents, and so that's the direction I took. It is *way* too late to change that in any meaningful way
In the patch below the problem being fixed is that the first call to PyArray_FromAny can result in the erasure of dtype *before* Py_INCREF is called. Perhaps you can argue that this only occurs when NULL is returned...
Indeed I would argue that because the array object holds a reference to the typecode (data-type object). Only if the call returns NULL does the data-type object lose it's reference count, and in fact that works out rather nicely and avoids a host of extra Py_DECREFs.
The next patch deals with an interestingly subtle memory leak in _string_richcompare where if casting to a common type fails then a reference count will leaked. Actually this one has nothing to do with PyArray_FromAny, but I spotted it in passing.
This is a good catch. Thanks!
I really don't think that this design of reference count handling in PyArray_FromAny (and consequently PyArray_CheckFromAny) is a good idea.
Not only is this not a good idea, it's not documented in the API documentation (I'm referring to the "Guide to NumPy" book) Hmmm... Are you sure? I remember writing a bit about in the paragraphs
Your point is well noted, but again given the provenance of the code, I still think it was the best choice. And, yes, it is too late to change it. that describe the relevant API calls. But, you could be right.
I've been trying to find some documentation on stealing references. The Python C API reference (http://docs.python.org/api/refcountDetails.html) says
Few functions steal references; the two notable exceptions are PyList_SetItem() and PyTuple_SetItem()
An interesting essay on reference counting is at http://lists.blender.org/pipermail/bf-python/2005-September/003092.html
Believe me, I understand reference counting pretty well. Still, it can be tricky to do correctly and it is easy to forget corner cases and error-returns. I very much appreciate your careful analysis, but I did an analysis of my own when I wrote the code, and so I will be resistant to change things if I can't see the error. I read something from Guido once that said something to the effect that nothing beats studying the code to get reference counting right. I think this is true.
In conclusion, I can't find much about the role of stealing in reference count management, but it's such a source of surprise (and frankly doesn't actually work out all that well in numpy) I strongly beg to differ. This sounds very naive to me. IMHO it has worked out extremely well in converting the PyArray_Descr C-structure into the data-type objects that adds so much power to NumPy.
Yes, there are a few corner cases that you have done an excellent job in digging up, but they are "corner" cases that don't cause problems for 99.9% of the use-cases. -Travis
participants (5)
-
Charles R Harris
-
David Cournapeau
-
Michael Abbott
-
Robert Kern
-
Travis E. Oliphant