head crashing (was: Fwd: [Python-checkins] buildbot warnings in x86 mvlgcc trunk)

This is the third time I've seen a crash on 2 different machines. This is the first time I noticed this unexplained crash: http://python.org/dev/buildbot/all/amd64%20gentoo%20trunk/builds/1983/step-t... That was at r54982. I tried to reproduce this: with a non-debug build, with a debug build, with valgrind with both types of build. I could never reproduce it. Valgrind did not report any errors either. Here is the third failure: http://python.org/dev/buildbot/all/amd64%20gentoo%20trunk/builds/1986/step-t... The failure below prints: python: Objects/obmalloc.c:746: PyObject_Malloc: Assertion `bp != ((void *)0)' failed. which probably doesn't really help since the corruption has already occurred. See http://python.org/dev/buildbot/all/x86%20mvlgcc%20trunk/builds/497/step-test... Anyone have ideas what might have caused this? n -- ---------- Forwarded message ---------- From: buildbot@python.org <buildbot@python.org> Date: Apr 30, 2007 11:17 PM Subject: [Python-checkins] buildbot warnings in x86 mvlgcc trunk To: python-checkins@python.org The Buildbot has detected a new failure of x86 mvlgcc trunk. Full details are available at: http://www.python.org/dev/buildbot/all/x86%2520mvlgcc%2520trunk/builds/497 Buildbot URL: http://www.python.org/dev/buildbot/all/ Build Reason: Build Source Stamp: [branch trunk] HEAD Blamelist: georg.brandl Build had warnings: warnings test Excerpt from the test logfile: make: *** [buildbottest] Aborted (core dumped) sincerely, -The Buildbot _______________________________________________ Python-checkins mailing list Python-checkins@python.org http://mail.python.org/mailman/listinfo/python-checkins

In rev 54982 (the first time this crash was seen), I see something which might create a problem. In python/trunk/Modules/posixmodule.c (near line 6300): + PyMem_FREE(mode); Py_END_ALLOW_THREADS Can you call PyMem_FREE() without the GIL held? I couldn't find it documented either way. Of the 3 failures I know of, below is the intersection of the tests that were run prior to crashing: set(['test_threadedtempfile', 'test_cgi', 'test_dircache', 'test_set', 'test_binascii', 'test_imp', 'test_multibytecodec', 'test_weakref', 'test_ftplib', 'test_posixpath', 'test_xmlrpc', 'test_urllibnet', 'test_old_mailbox', 'test_distutils', 'test_site', 'test_runpy', 'test_fork1', 'test_traceback']) n -- On 4/30/07, Neal Norwitz <nnorwitz@gmail.com> wrote:
This is the third time I've seen a crash on 2 different machines. This is the first time I noticed this unexplained crash:
http://python.org/dev/buildbot/all/amd64%20gentoo%20trunk/builds/1983/step-t...
That was at r54982.
I tried to reproduce this: with a non-debug build, with a debug build, with valgrind with both types of build. I could never reproduce it. Valgrind did not report any errors either.
Here is the third failure:
http://python.org/dev/buildbot/all/amd64%20gentoo%20trunk/builds/1986/step-t...
The failure below prints: python: Objects/obmalloc.c:746: PyObject_Malloc: Assertion `bp != ((void *)0)' failed.
which probably doesn't really help since the corruption has already occurred. See http://python.org/dev/buildbot/all/x86%20mvlgcc%20trunk/builds/497/step-test...
Anyone have ideas what might have caused this?
n --
---------- Forwarded message ---------- From: buildbot@python.org <buildbot@python.org> Date: Apr 30, 2007 11:17 PM Subject: [Python-checkins] buildbot warnings in x86 mvlgcc trunk To: python-checkins@python.org
The Buildbot has detected a new failure of x86 mvlgcc trunk. Full details are available at: http://www.python.org/dev/buildbot/all/x86%2520mvlgcc%2520trunk/builds/497
Buildbot URL: http://www.python.org/dev/buildbot/all/
Build Reason: Build Source Stamp: [branch trunk] HEAD Blamelist: georg.brandl
Build had warnings: warnings test
Excerpt from the test logfile: make: *** [buildbottest] Aborted (core dumped)
sincerely, -The Buildbot
_______________________________________________ Python-checkins mailing list Python-checkins@python.org http://mail.python.org/mailman/listinfo/python-checkins

On 5/1/07, Neal Norwitz <nnorwitz@gmail.com> wrote:
In rev 54982 (the first time this crash was seen), I see something which might create a problem. In python/trunk/Modules/posixmodule.c (near line 6300):
+ PyMem_FREE(mode); Py_END_ALLOW_THREADS
The PyMem_MALLOC call that creates 'mode' is also called without explicitly holding the GIL. Can you call PyMem_FREE() without the GIL held? I couldn't find it
documented either way.
I believe the GIL does not need to be held, but obviously Tim or someone with more memory experience should step in to say definitively. If you look at Include/pymem.h, PyMem_FREE gets defined as PyObject_FREE in a debug build. PyObject_Free is defined at _PyObject_DebugFree. That function checks that the memory has not been written with the debug bit pattern and then calls PyObject_Free. That call just sticks the memory back into pymalloc's memory pool which is implemented without using any Python objects. In other words no Python objects are used in pymalloc (to my knowledge) and thus is safe to use without the GIL. -Brett

I believe the GIL does not need to be held, but obviously Tim or someone with more memory experience should step in to say definitively.
If you look at Include/pymem.h, PyMem_FREE gets defined as PyObject_FREE in a debug build. PyObject_Free is defined at _PyObject_DebugFree. That function checks that the memory has not been written with the debug bit pattern and then calls PyObject_Free. That call just sticks the memory back into pymalloc's memory pool which is implemented without using any Python objects.
In other words no Python objects are used in pymalloc (to my knowledge)
This is also what I found.
and thus is safe to use without the GIL.
but I got to a different conclusion. If it really goes through the pymalloc pool (obmalloc), then it must hold the GIL while doing so. obmalloc itself is not thread-safe, and relies on the GIL for thread-safety. In release mode, PyMEM_FREE goes directly to free, which is thread-safe. Regards, Martin

but I got to a different conclusion. If it really goes through the pymalloc pool (obmalloc), then it must hold the GIL while doing so. obmalloc itself is not thread-safe, and relies on the GIL for thread-safety.
In release mode, PyMEM_FREE goes directly to free, which is thread- safe.
Yes. It is quite unfortunate how PyMem_* gets redirected to the PyObject_* functions in debug builds. Even worse is how PyObject_Malloc gets #defined to PyObject_DebugMalloc for debug builds, changing linkage of modules. But that is a different matter. One thing I'd like to point out however, is that it is quite unnecessary for the PyObject_DebugMalloc() functions to lie on top of PyObject_Malloc() They can just call malloc() etc. directly, since in debug builds the performance benefit of the block allocator is moot. I'd suggest to keep the debug functions as a thin layer on top of malloc to do basic testing. I'd even suggest that we reverse things, and move the debug library to pymem.c. This would keep the debug functionalty threadsafe on top of regular malloc, rather than wrapping it in there with the non-threadsafe object allocator. We would then have void *PyMem_DebugMalloc() /* layers malloc /* void *PyMem_Malloc() /* calls PyMem_MALLOC */ #ifndef _DEBUG #define PyMem_MALLOC malloc #else #define PyMem_MALLOC PyMem_DebugMalloc #endif PyObject_Malloc() would then just call PyMem_DebugMalloc in DEBUG builds. The reason I have opinions on this is that at CCP we have spent considerable effort on squeezing our own veneer functions into the memory allocators, both for the PyMem ones and PyObject. And the structure of the macros and their interconnectivity really doesn't make it easy. We ended up creating a set of macros like PyMem_MALLOC_INNER() and ease our functions between the MALLOC and the INNER. I'll try to show you the patch one day which is a reasonable attempt at a slight reform in the structure of these memory APIs. Perhaps something for Py3K. Kristjan

Kristján> I'd suggest to keep the debug functions as a thin layer on top Kristján> of malloc to do basic testing. But then you would substantially change the memory access behavior of the program in a debug build, that is, more than it is already changed by the fact that you have changed the memory layout of Python objects. Skip

-----Original Message----- From: skip@pobox.com [mailto:skip@pobox.com] Sent: Tuesday, May 01, 2007 20:46 But then you would substantially change the memory access behavior of the program in a debug build, that is, more than it is already changed by the fact that you have changed the memory layout of Python objects.
Well, as we say in Iceland, that is a piece difference, not the whole sheep. In fact, most of the memory is already managed by the Object allocator, so there is only slight additional change. Further, at least one platform (windows) already employs a different memory allocator implementation for malloc in debug builds, namely a debug allocator. In addition, many python structures grow extra members in debug builds, most notably the PyObject _head. So you probably never have the exactly same blocksize pattern anyway. At any rate, the debug memory only changes memory access patterns by growing every block by a fixed amount. And why do we want to keep the same memory pattern? Isn't the memory allocator supposed to be a black box? The only reason I can see for maintaining the exact same pattern in a debug build is to reproduce some sort of memory access error, but that is precisely what the debug routines are for. Admittedly, I have never used the debug routines much. generally disable the object allocator for debug builds, and rely on the windows debug malloc implementation to spot errors for me, or failing that, I use Rational Purify, which costs money.

[Neal Norwitz]
In rev 54982 (the first time this crash was seen), I see something which might create a problem. In python/trunk/Modules/posixmodule.c (near line 6300):
+ PyMem_FREE(mode); Py_END_ALLOW_THREADS
Shouldn't do that. [Brett Cannon]
The PyMem_MALLOC call that creates 'mode' is also called without explicitly holding the GIL.
Or that ;-)
Can you call PyMem_FREE() without the GIL held? I couldn't find it documented either way.
I believe the GIL does not need to be held, but obviously Tim or someone with more memory experience should step in to say definitively.
The GIL should be held. The relevant docs are in the Python/C API manual, section "8.1 Thread State and the Global Interpreter Lock": Therefore, the rule exists that only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions. PyMem_XYZ is certainly a "Python/C API function". There are functions you can call without holding the GIL, and section 8.1 intends to give an exhaustive list of those. These are functions that can't rely on the GIL, like PyEval_InitThreads() (which /creates/ the GIL), and various functions that create and destroy thread and interpreter state.
If you look at Include/pymem.h, PyMem_FREE gets defined as PyObject_FREE in a debug build. PyObject_Free is defined at _PyObject_DebugFree. That function checks that the memory has not been written with the debug bit pattern and then calls PyObject_Free. That call just sticks the memory back into pymalloc's memory pool which is implemented without using any Python objects.
But pymalloc's pools have a complex internal structure of their own, and cannot be mucked with safely by multiple threads simultaneously.
In other words no Python objects are used in pymalloc (to my knowledge) and thus is safe to use without the GIL.
Nope. For example, if two threads simultaneously try to free objects in the same obmalloc size class, there are a number of potential thread-race disasters in linking the objects into the same size-class chain. In a release build this doesn't matter, since PyMem_XYZ map directly to the platform malloc/realloc/free, and so inherit the thread safety (or lack thereof) of the platform C implementations. If it's necessary to do malloc/free kinds of things without holding the GIL, then the platform malloc/free must be called directly. Perhaps that's what posixmodule.c wants to do in this case.

On 5/1/07, Tim Peters <tim.peters@gmail.com> wrote:
[Neal Norwitz]
In rev 54982 (the first time this crash was seen), I see something which might create a problem. In python/trunk/Modules/posixmodule.c (near line 6300):
+ PyMem_FREE(mode); Py_END_ALLOW_THREADS
Shouldn't do that.
[Brett Cannon]
The PyMem_MALLOC call that creates 'mode' is also called without explicitly holding the GIL.
Or that ;-)
Luckily I misread the code so it doesn't do that boo-boo.
Can you call PyMem_FREE() without the GIL held? I couldn't find it
documented either way.
I believe the GIL does not need to be held, but obviously Tim or someone with more memory experience should step in to say definitively.
The GIL should be held. The relevant docs are in the Python/C API manual, section "8.1 Thread State and the Global Interpreter Lock":
Therefore, the rule exists that only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions.
PyMem_XYZ is certainly a "Python/C API function". There are functions you can call without holding the GIL, and section 8.1 intends to give an exhaustive list of those. These are functions that can't rely on the GIL, like PyEval_InitThreads() (which /creates/ the GIL), and various functions that create and destroy thread and interpreter state.
If you look at Include/pymem.h, PyMem_FREE gets defined as PyObject_FREE in a debug build. PyObject_Free is defined at _PyObject_DebugFree. That function checks that the memory has not been written with the debug bit pattern and then calls PyObject_Free. That call just sticks the memory back into pymalloc's memory pool which is implemented without using any Python objects.
But pymalloc's pools have a complex internal structure of their own, and cannot be mucked with safely by multiple threads simultaneously.
Ah, OK. That makes sense. Glad I pointed out my ignorance then. =) -Brett

Neal Norwitz <nnorwitz <at> gmail.com> writes:
Can you call PyMem_FREE() without the GIL held? I couldn't find it documented either way.
Nope. See comments at the top of Python/pystate.c. Cheers, mwh
participants (7)
-
"Martin v. Löwis"
-
Brett Cannon
-
Kristján Valur Jónsson
-
Michael Hudson
-
Neal Norwitz
-
skip@pobox.com
-
Tim Peters