assert failure on obmalloc

Failure running the test suite today with -u compiler enabled on Windows XP. test_logging Assertion failed: bp != NULL, file \code\python\dist\src\Objects\obmalloc.c, line 604 The debugger says the error is here: msvcr71d.dll!_assert(const char * expr=0x1e22bcc0, const char * filename=0x1e22bc94, unsigned int lineno=604) Line 306 C python24_d.dll!PyObject_Malloc(unsigned int nbytes=100) Line 604 + 0x1b C python24_d.dll!_PyObject_DebugMalloc(unsigned int nbytes=84) Line 1014 + 0x9 C python24_d.dll!PyThreadState_New(_is * interp=0x00951028) Line 136 + 0x7 C python24_d.dll!PyGILState_Ensure() Line 430 + 0xc C python24_d.dll!t_bootstrap(void * boot_raw=0x02801d48) Line 431 + 0x5 C python24_d.dll!bootstrap(void * call=0x04f0d264) Line 166 + 0x7 C msvcr71d.dll!_threadstart(void * ptd=0x026a2320) Line 196 + 0xd C I've been seeing this sort of error on-and-off for at least a year with my Python 2.3 install. It's the usual reason my spambayes popproxy dies. I can't recell seeing it before on Windows or while running the test suite. Jeremy

Jeremy Hylton <jhylton@gmail.com> writes:
Don't debug builds route all PyMem_ calls through PyMalloc? Doesn't pymalloc rely on the GIL being held when it's called? If both of these are true, there's an obvious problem here, because the call to PyMem_NEW in PyThreadState_New certainly isn't called with the GIL held... This would only be a problem in a debug build, though. Cheers, mwh -- Never meddle in the affairs of NT. It is slow to boot and quick to crash. -- Stephen Harris -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html

[Michael Hudson]
Don't debug builds route all PyMem_ calls through PyMalloc?
Indeed they do.
Doesn't pymalloc rely on the GIL being held when it's called?
Indeed it does.
Indeed that sucks.
This would only be a problem in a debug build, though.
So it's Jeremy's fault, just as we suspected all along. There are lock macros in obmalloc, which currently expand to nothing. They could be changed to "do something" in a debug build, but I'd rather not -- the debug capabilities of obmalloc are more useful the nastier a memory corruption problem is, and few things make problems nastier than throwing threads into the mix. A cheap trick is to ensure that all code that may be called without the GIL calls the platform malloc()/free() directly. Alas, I haven't been able to reproduce Jeremy's symptom.

[Jeremy Hylton, on Tue, 7 Sep 2004]
I expect bug 1041645 is relevant. That was suffering the debug-build problem identified later in this thread by Michael Hudson, and also from that the internal TLS function find_key() was thread-insane (regardless of build type). Those should all be fixed now, so gripe if you see this again after updating. BTW, those are all critical bugfixes, but I don't have time to backport them. If anyone does, there was one checkin in the chain that changed when PyGILState_Release() deleted its TLS key, but I'm pretty sure there was no actual need for that change.

Jeremy Hylton <jhylton@gmail.com> writes:
Don't debug builds route all PyMem_ calls through PyMalloc? Doesn't pymalloc rely on the GIL being held when it's called? If both of these are true, there's an obvious problem here, because the call to PyMem_NEW in PyThreadState_New certainly isn't called with the GIL held... This would only be a problem in a debug build, though. Cheers, mwh -- Never meddle in the affairs of NT. It is slow to boot and quick to crash. -- Stephen Harris -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html

[Michael Hudson]
Don't debug builds route all PyMem_ calls through PyMalloc?
Indeed they do.
Doesn't pymalloc rely on the GIL being held when it's called?
Indeed it does.
Indeed that sucks.
This would only be a problem in a debug build, though.
So it's Jeremy's fault, just as we suspected all along. There are lock macros in obmalloc, which currently expand to nothing. They could be changed to "do something" in a debug build, but I'd rather not -- the debug capabilities of obmalloc are more useful the nastier a memory corruption problem is, and few things make problems nastier than throwing threads into the mix. A cheap trick is to ensure that all code that may be called without the GIL calls the platform malloc()/free() directly. Alas, I haven't been able to reproduce Jeremy's symptom.

[Jeremy Hylton, on Tue, 7 Sep 2004]
I expect bug 1041645 is relevant. That was suffering the debug-build problem identified later in this thread by Michael Hudson, and also from that the internal TLS function find_key() was thread-insane (regardless of build type). Those should all be fixed now, so gripe if you see this again after updating. BTW, those are all critical bugfixes, but I don't have time to backport them. If anyone does, there was one checkin in the chain that changed when PyGILState_Release() deleted its TLS key, but I'm pretty sure there was no actual need for that change.
participants (3)
-
Jeremy Hylton
-
Michael Hudson
-
Tim Peters