Don't debug builds route all PyMem_ calls through PyMalloc?
Indeed they do.
Doesn't pymalloc rely on the GIL being held when it's called?
Indeed it does.
If both of these are true, there's an obvious problem here, because the call to PyMem_NEW in PyThreadState_New certainly isn't called with the GIL held...
Indeed that sucks.
This would only be a problem in a debug build, though.
So it's Jeremy's fault, just as we suspected all along.
There are lock macros in obmalloc, which currently expand to nothing. They could be changed to "do something" in a debug build, but I'd rather not -- the debug capabilities of obmalloc are more useful the nastier a memory corruption problem is, and few things make problems nastier than throwing threads into the mix.
A cheap trick is to ensure that all code that may be called without the GIL calls the platform malloc()/free() directly. Alas, I haven't been able to reproduce Jeremy's symptom.