
I ran valgrind 1.0.3 on Python 2.2.2. Well, almost 2.2.2, it was from 12 Oct. After 90+ minutes and over 512 MB of RAM, there are no new/major issues to report. The complete valgrind log (231k) can be found here: http://www.metaslash.com/py/valgrind-2_2_2.txt I've cleaned up the log (200k) to remove some of the uninteresting stuff: http://www.metaslash.com/py/valgrind-clean-2_2_2.txt There are a few small memory leaks. I think most of the leaks are related to threads. The report contains memory still in use as well as leaks. To find memory leaks, search for ' lost ' without quotes. It's quite possible some of the memory still in use is due to reference leaks. Note: one common leak is from readline. Some of the read errors are from GLIBC. I had to skip test_pty, I think it caused the tests to hang. test_commands and test_popen2 fail when running valgrind. test_popen2 is due to valgrind output going to stdout/stderr. Neal

I looked at this one only.
This seems to be the most worrysome: ==28827== 268980 bytes in 15 blocks are possibly lost in loss record 100 of 106 ==28827== at 0x400487CD: realloc (vg_clientfuncs.c:270) ==28827== by 0x80994DC: _PyObject_GC_Resize (Modules/gcmodule.c:917) ==28827== by 0x80BD46E: PyFrame_New (Objects/frameobject.c:264) ==28827== by 0x80782C3: PyEval_EvalCodeEx (Python/ceval.c:2390) ==28827== by 0x807AA43: fast_function (Python/ceval.c:3173) ==28827== by 0x80779B5: eval_frame (Python/ceval.c:2034) ==28827== by 0x807892B: PyEval_EvalCodeEx (Python/ceval.c:2595) ==28827== by 0x807AA43: fast_function (Python/ceval.c:3173) ==28827== by 0x80779B5: eval_frame (Python/ceval.c:2034) ==28827== by 0x807892B: PyEval_EvalCodeEx (Python/ceval.c:2595) There are a few other records mentioning GC_Resize, but this one is the biggest. Could it be that the free frame list is botched? OTOH, what does "possibly lost" really mean? There are also a few fingers pointing in the direction of weakref_ref, e.g. ==28827== 520 bytes in 14 blocks are possibly lost in loss record 48 of 106 ==28827== at 0x400481B4: malloc (vg_clientfuncs.c:100) ==28827== by 0x8099519: _PyObject_GC_New (Modules/gcmodule.c:868) ==28827== by 0x8067BA5: PyWeakref_NewRef (Objects/weakrefobject.c:37) ==28827== by 0x8066119: add_subclass (Objects/typeobject.c:2249) ==28827== by 0x8061F29: PyType_Ready (Objects/typeobject.c:2219) ==28827== by 0x80605A8: type_new (Objects/typeobject.c:1280) ==28827== by 0x805EDA4: type_call (Objects/typeobject.c:183) ==28827== by 0x80ABB0C: PyObject_Call (Objects/abstract.c:1688) ==28827== by 0x807A34F: PyEval_CallObjectWithKeywords (Python/ceval.c:3058) ==28827== by 0x80AB0C6: PyObject_CallFunction (Objects/abstract.c:1679) Of course many of these could be caused by a single leak that drops a pointer to a container -- then everything owned by that container is also leaked. I noticed this one: ==28713== 572 bytes in 15 blocks are possibly lost in loss record 39 of 78 ==28713== at 0x400481B4: malloc (vg_clientfuncs.c:100) ==28713== by 0x8099519: _PyObject_GC_New (Modules/gcmodule.c:868) ==28713== by 0x80B2D09: PyMethod_New (Objects/classobject.c:2008) ==28713== by 0x80AF837: instance_getattr2 (Objects/classobject.c:702) ==28713== by 0x80AF73A: instance_getattr1 (Objects/classobject.c:676) ==28713== by 0x80B30F1: instance_getattr (Objects/classobject.c:715) ==28713== by 0x80577A2: PyObject_GetAttr (Objects/object.c:1108) ==28713== by 0x80B1731: half_cmp (Objects/classobject.c:1503) ==28713== by 0x80B1937: instance_compare (Objects/classobject.c:1572) ==28713== by 0x8055A6E: try_3way_compare (Objects/object.c:477) which led me to an easy-to-fix leak in half_cmp(), both in 2.2.2 and 2.3: diff -c -c -r2.154.8.1 classobject.c *** classobject.c 13 Jun 2002 21:36:35 -0000 2.154.8.1 --- classobject.c 18 Oct 2002 00:36:06 -0000 *************** *** 1507,1514 **** } args = Py_BuildValue("(O)", w); ! if (args == NULL) return -2; result = PyEval_CallObject(cmp_func, args); Py_DECREF(args); --- 1507,1516 ---- } args = Py_BuildValue("(O)", w); ! if (args == NULL) { ! Py_DECREF(cmp_func); return -2; + } result = PyEval_CallObject(cmp_func, args); Py_DECREF(args); but somehow I don't think that caused the report, because this exit can only be taken if there's a memory error. (Hm... or if w == NULL upon entry? How could that happen?) A similar one is on half_binop. --Guido van Rossum (home page: http://www.python.org/~guido/)

This one seems simple: diff -c -c -r2.126.4.25 typeobject.c *** typeobject.c 11 Oct 2002 00:22:22 -0000 2.126.4.25 --- typeobject.c 18 Oct 2002 00:50:31 -0000 *************** *** 2249,2256 **** while (--i >= 0) { ref = PyList_GET_ITEM(list, i); assert(PyWeakref_CheckRef(ref)); ! if (PyWeakref_GET_OBJECT(ref) == Py_None) ! return PyList_SetItem(list, i, new); } i = PyList_Append(list, new); Py_DECREF(new); --- 2249,2259 ---- while (--i >= 0) { ref = PyList_GET_ITEM(list, i); assert(PyWeakref_CheckRef(ref)); ! if (PyWeakref_GET_OBJECT(ref) == Py_None) { ! i = PyList_SetItem(list, i, new); ! Py_DECREF(new); ! return i; ! } } i = PyList_Append(list, new); Py_DECREF(new); --Guido van Rossum (home page: http://www.python.org/~guido/)

This one seems simple: [patch to typeobject.c]
Deceptively so -- I forgot that PyList_SetItem(list, i, new) steals a reference to new. :-( The search is still on... --Guido van Rossum (home page: http://www.python.org/~guido/)

Just been lurkin' :) Possibly lost means that Valgrind didn't see any freeing for a specific point after a malloc. In valgrind terms, you probably forgot to free a pointer and lost all points that could reference that memory. In other words, a leak :) Usually 75% of those errors can be solved locally within the function, although I haven't looked at the specifics, but that 25% can get nasty. -- Mike On Thu, Oct 17 @ 20:38, Guido van Rossum wrote:
-- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html

Booh. That statistic doesn't apply to Python -- there are no "simple" leaks left in Python's C code, the remaining ones are all nasty. Also, valgrind carefully distinguishes between "definitely lost" and "possibly lost". What's the difference? --Guido van Rossum (home page: http://www.python.org/~guido/)

This should help shed some light on the situation: Quoth the docs: """ For each such block, Valgrind scans the entire address space of the process, looking for pointers to the block. One of three situations may result: * A pointer to the start of the block is found. This usually indicates programming sloppiness; since the block is still pointed at, the programmer could, at least in principle, free'd it before program exit. * A pointer to the interior of the block is found. The pointer might originally have pointed to the start and have been moved along, or it might be entirely unrelated. Valgrind deems such a block as "dubious", that is, possibly leaked, because it's unclear whether or not a pointer to it still exists. * The worst outcome is that no pointer to the block can be found. The block is classified as "leaked", because the programmer could not possibly have free'd it at program exit, since no pointer to it exists. This might be a symptom of having lost the pointer at some earlier point in the program. """ Possibly is the second case and definitely lost is the third case. The definitely lost, in my experience, tends to mean you just forgot to free a pointer. The possibly lost usually means that some memory rot occurred, where it's not clear which pointer is causing the mem leak. -- Mike On Mon, Oct 28 @ 19:37, Guido van Rossum wrote:
-- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html

This should help shed some light on the situation:
Thanks; this indeed helps.
Booh again. Lots of globals get initialized with pointers to malloc'ed blocks that are never freed. There are never called "leaks" in other leak detectors, just "alive at exit". I think valgrind actually doesn't call these leaks either.
Aha! This may be the case. When an object has a GC header, all pointers to the object point to an address 12 bytes in the block, which is where the "object" lay-out begins. Normally, there should be at least one pointer to the start of the block from one of the GC chains, but objects don't have to be in a chain at all. (I wonder if pymalloc adds to the confusion, since its arenas count as a single block to malloc and hence to valgrind, but are internally cut up into many objects.)
This is a true leak.
How much Python extension coding (in C) have you done? In Python, it almost never is a matter of forgetting to free() -- it's usually a matter of forgetting to DECREF, and sometimes a matter of doing an unnecessary INCREF. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
For each arena, the address returned by malloc() is stored in the file-static arenas[] vector. So if a diagnostic program can find that vector, it can find the base address of every arena gotten from malloc. In a debug build, though, pymalloc adds pad bytes to both ends of each request (and whether handled by pymalloc or by malloc!), and returns the address of the byte beyond the leading pad byte. This can leave any number of system-malloc blocks with no direct pointer to their start. That's specific to the debug build, which forces all Python mem API calls to go thru pymalloc (the release build only directs PyObject_{Malloc,etc}() calls to pymalloc).

On Mon, Oct 28 @ 20:15, Guido van Rossum wrote:
I don't think valgrind reports this first case. Only the second and third, from what I gathered in the docs.
I've done a fair bit, but to explain valgrind things, I figured it was best to talk the valgrind talk. Either way, it comes to the same thing. Glad that helped. -- Mike -- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html

I looked at this one only.
This seems to be the most worrysome: ==28827== 268980 bytes in 15 blocks are possibly lost in loss record 100 of 106 ==28827== at 0x400487CD: realloc (vg_clientfuncs.c:270) ==28827== by 0x80994DC: _PyObject_GC_Resize (Modules/gcmodule.c:917) ==28827== by 0x80BD46E: PyFrame_New (Objects/frameobject.c:264) ==28827== by 0x80782C3: PyEval_EvalCodeEx (Python/ceval.c:2390) ==28827== by 0x807AA43: fast_function (Python/ceval.c:3173) ==28827== by 0x80779B5: eval_frame (Python/ceval.c:2034) ==28827== by 0x807892B: PyEval_EvalCodeEx (Python/ceval.c:2595) ==28827== by 0x807AA43: fast_function (Python/ceval.c:3173) ==28827== by 0x80779B5: eval_frame (Python/ceval.c:2034) ==28827== by 0x807892B: PyEval_EvalCodeEx (Python/ceval.c:2595) There are a few other records mentioning GC_Resize, but this one is the biggest. Could it be that the free frame list is botched? OTOH, what does "possibly lost" really mean? There are also a few fingers pointing in the direction of weakref_ref, e.g. ==28827== 520 bytes in 14 blocks are possibly lost in loss record 48 of 106 ==28827== at 0x400481B4: malloc (vg_clientfuncs.c:100) ==28827== by 0x8099519: _PyObject_GC_New (Modules/gcmodule.c:868) ==28827== by 0x8067BA5: PyWeakref_NewRef (Objects/weakrefobject.c:37) ==28827== by 0x8066119: add_subclass (Objects/typeobject.c:2249) ==28827== by 0x8061F29: PyType_Ready (Objects/typeobject.c:2219) ==28827== by 0x80605A8: type_new (Objects/typeobject.c:1280) ==28827== by 0x805EDA4: type_call (Objects/typeobject.c:183) ==28827== by 0x80ABB0C: PyObject_Call (Objects/abstract.c:1688) ==28827== by 0x807A34F: PyEval_CallObjectWithKeywords (Python/ceval.c:3058) ==28827== by 0x80AB0C6: PyObject_CallFunction (Objects/abstract.c:1679) Of course many of these could be caused by a single leak that drops a pointer to a container -- then everything owned by that container is also leaked. I noticed this one: ==28713== 572 bytes in 15 blocks are possibly lost in loss record 39 of 78 ==28713== at 0x400481B4: malloc (vg_clientfuncs.c:100) ==28713== by 0x8099519: _PyObject_GC_New (Modules/gcmodule.c:868) ==28713== by 0x80B2D09: PyMethod_New (Objects/classobject.c:2008) ==28713== by 0x80AF837: instance_getattr2 (Objects/classobject.c:702) ==28713== by 0x80AF73A: instance_getattr1 (Objects/classobject.c:676) ==28713== by 0x80B30F1: instance_getattr (Objects/classobject.c:715) ==28713== by 0x80577A2: PyObject_GetAttr (Objects/object.c:1108) ==28713== by 0x80B1731: half_cmp (Objects/classobject.c:1503) ==28713== by 0x80B1937: instance_compare (Objects/classobject.c:1572) ==28713== by 0x8055A6E: try_3way_compare (Objects/object.c:477) which led me to an easy-to-fix leak in half_cmp(), both in 2.2.2 and 2.3: diff -c -c -r2.154.8.1 classobject.c *** classobject.c 13 Jun 2002 21:36:35 -0000 2.154.8.1 --- classobject.c 18 Oct 2002 00:36:06 -0000 *************** *** 1507,1514 **** } args = Py_BuildValue("(O)", w); ! if (args == NULL) return -2; result = PyEval_CallObject(cmp_func, args); Py_DECREF(args); --- 1507,1516 ---- } args = Py_BuildValue("(O)", w); ! if (args == NULL) { ! Py_DECREF(cmp_func); return -2; + } result = PyEval_CallObject(cmp_func, args); Py_DECREF(args); but somehow I don't think that caused the report, because this exit can only be taken if there's a memory error. (Hm... or if w == NULL upon entry? How could that happen?) A similar one is on half_binop. --Guido van Rossum (home page: http://www.python.org/~guido/)

This one seems simple: diff -c -c -r2.126.4.25 typeobject.c *** typeobject.c 11 Oct 2002 00:22:22 -0000 2.126.4.25 --- typeobject.c 18 Oct 2002 00:50:31 -0000 *************** *** 2249,2256 **** while (--i >= 0) { ref = PyList_GET_ITEM(list, i); assert(PyWeakref_CheckRef(ref)); ! if (PyWeakref_GET_OBJECT(ref) == Py_None) ! return PyList_SetItem(list, i, new); } i = PyList_Append(list, new); Py_DECREF(new); --- 2249,2259 ---- while (--i >= 0) { ref = PyList_GET_ITEM(list, i); assert(PyWeakref_CheckRef(ref)); ! if (PyWeakref_GET_OBJECT(ref) == Py_None) { ! i = PyList_SetItem(list, i, new); ! Py_DECREF(new); ! return i; ! } } i = PyList_Append(list, new); Py_DECREF(new); --Guido van Rossum (home page: http://www.python.org/~guido/)

This one seems simple: [patch to typeobject.c]
Deceptively so -- I forgot that PyList_SetItem(list, i, new) steals a reference to new. :-( The search is still on... --Guido van Rossum (home page: http://www.python.org/~guido/)

Just been lurkin' :) Possibly lost means that Valgrind didn't see any freeing for a specific point after a malloc. In valgrind terms, you probably forgot to free a pointer and lost all points that could reference that memory. In other words, a leak :) Usually 75% of those errors can be solved locally within the function, although I haven't looked at the specifics, but that 25% can get nasty. -- Mike On Thu, Oct 17 @ 20:38, Guido van Rossum wrote:
-- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html

Booh. That statistic doesn't apply to Python -- there are no "simple" leaks left in Python's C code, the remaining ones are all nasty. Also, valgrind carefully distinguishes between "definitely lost" and "possibly lost". What's the difference? --Guido van Rossum (home page: http://www.python.org/~guido/)

This should help shed some light on the situation: Quoth the docs: """ For each such block, Valgrind scans the entire address space of the process, looking for pointers to the block. One of three situations may result: * A pointer to the start of the block is found. This usually indicates programming sloppiness; since the block is still pointed at, the programmer could, at least in principle, free'd it before program exit. * A pointer to the interior of the block is found. The pointer might originally have pointed to the start and have been moved along, or it might be entirely unrelated. Valgrind deems such a block as "dubious", that is, possibly leaked, because it's unclear whether or not a pointer to it still exists. * The worst outcome is that no pointer to the block can be found. The block is classified as "leaked", because the programmer could not possibly have free'd it at program exit, since no pointer to it exists. This might be a symptom of having lost the pointer at some earlier point in the program. """ Possibly is the second case and definitely lost is the third case. The definitely lost, in my experience, tends to mean you just forgot to free a pointer. The possibly lost usually means that some memory rot occurred, where it's not clear which pointer is causing the mem leak. -- Mike On Mon, Oct 28 @ 19:37, Guido van Rossum wrote:
-- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html

This should help shed some light on the situation:
Thanks; this indeed helps.
Booh again. Lots of globals get initialized with pointers to malloc'ed blocks that are never freed. There are never called "leaks" in other leak detectors, just "alive at exit". I think valgrind actually doesn't call these leaks either.
Aha! This may be the case. When an object has a GC header, all pointers to the object point to an address 12 bytes in the block, which is where the "object" lay-out begins. Normally, there should be at least one pointer to the start of the block from one of the GC chains, but objects don't have to be in a chain at all. (I wonder if pymalloc adds to the confusion, since its arenas count as a single block to malloc and hence to valgrind, but are internally cut up into many objects.)
This is a true leak.
How much Python extension coding (in C) have you done? In Python, it almost never is a matter of forgetting to free() -- it's usually a matter of forgetting to DECREF, and sometimes a matter of doing an unnecessary INCREF. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
For each arena, the address returned by malloc() is stored in the file-static arenas[] vector. So if a diagnostic program can find that vector, it can find the base address of every arena gotten from malloc. In a debug build, though, pymalloc adds pad bytes to both ends of each request (and whether handled by pymalloc or by malloc!), and returns the address of the byte beyond the leading pad byte. This can leave any number of system-malloc blocks with no direct pointer to their start. That's specific to the debug build, which forces all Python mem API calls to go thru pymalloc (the release build only directs PyObject_{Malloc,etc}() calls to pymalloc).

On Mon, Oct 28 @ 20:15, Guido van Rossum wrote:
I don't think valgrind reports this first case. Only the second and third, from what I gathered in the docs.
I've done a fair bit, but to explain valgrind things, I figured it was best to talk the valgrind talk. Either way, it comes to the same thing. Glad that helped. -- Mike -- Michael Gilfix mgilfix@eecs.tufts.edu For my gpg public key: http://www.eecs.tufts.edu/~mgilfix/contact.html
participants (5)
-
Guido van Rossum
-
Jeff Epler
-
Michael Gilfix
-
Neal Norwitz
-
Tim Peters