How to enable tracemalloc for the test suite?
I've got a memory issue in my modified Python interpreter I'm trying to debug. Output at the end of the problematic unit test looks like this: ... == Tests result: FAILURE then SUCCESS == 1 test OK. 1 re-run test: test_rattlesnake Total duration: 2.9 sec Tests result: FAILURE then SUCCESS Debug memory block at address p=0x55a227969080: API '' 0 bytes originally requested The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd): at p-7: 0x00 *** OUCH at p-6: 0x00 *** OUCH at p-5: 0x00 *** OUCH at p-4: 0x00 *** OUCH at p-3: 0x00 *** OUCH at p-2: 0x00 *** OUCH at p-1: 0x00 *** OUCH Because memory is corrupted at the start, the count of bytes requested may be bogus, and checking the trailing pad bytes may segfault. The 8 pad bytes at tail=0x55a227969080 are not all FORBIDDENBYTE (0xfd): at tail+0: 0x00 *** OUCH at tail+1: 0x00 *** OUCH at tail+2: 0x00 *** OUCH at tail+3: 0x00 *** OUCH at tail+4: 0x00 *** OUCH at tail+5: 0x00 *** OUCH at tail+6: 0x00 *** OUCH at tail+7: 0x00 *** OUCH The block was made by call #0 to debug malloc/realloc. Enable tracemalloc to get the memory block allocation traceback Looking at the tracemalloc module docs and trying various command line args (-X tracemalloc=5) or environment variables (PYTHONTRACEMALLOC=5) I'm unable to provoke any different output. The module docs give a tantalizing suggestion: Example of output of the Python test suite: but I see no command line args related to running the test suite with tracemalloc enabled. Pointers appreciated. Skip
Hi Skip, I wrote the feature (both tracemalloc and query tracemalloc when a buffer overflow is detected), so I should be able to help you ;-) Le dim. 5 avr. 2020 à 00:27, Skip Montanaro <skip.montanaro@gmail.com> a écrit :
Looking at the tracemalloc module docs and trying various command line args (-X tracemalloc=5) or environment variables (PYTHONTRACEMALLOC=5)
Yes, that's the right way to enable it. The env var is inherited by subprocesses.
I'm unable to provoke any different output.
Maybe your test runs Python with -I or -E which ignores the environment variable. Which command do do you run to run tests? It's unclear from your output which test and which process triggers the "Debug memory block at address" bug. Maybe you are out of lock and the buffer corruption is only detected (the memory is only deallocated) after tracemalloc was disabled by _PyTraceMalloc_Fini() call in Py_FinalizeEx(). You may try to hack Python by commenting this _PyTraceMalloc_Fini() call. FYI there is an unit test on the debug hooks on memory allocator to ensure that it detects buffer overflow: test_capi.test_buffer_overflow() checks for "Debug memory block at address (...)". Example: ------------------------ $ cat bug.py import _testcapi _testcapi.pymem_buffer_overflow() $ python3 bug.py Segmentation fault (core dumped) $ python3 -X tracemalloc=5 -X dev bug.py Debug memory block at address p=0x7f225fd1c100: API 'm' 16 bytes originally requested The 7 pad bytes at p-7 are FORBIDDENBYTE, as expected. The 8 pad bytes at tail=0x7f225fd1c110 are not all FORBIDDENBYTE (0xfd): at tail+0: 0x78 *** OUCH at tail+1: 0xfd at tail+2: 0xfd at tail+3: 0xfd at tail+4: 0xfd at tail+5: 0xfd at tail+6: 0xfd at tail+7: 0xfd The block was made by call #56367 to debug malloc/realloc. Data at p: cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd Memory block allocated at (most recent call first): File "bug.py", line 2 Fatal Python error: bad trailing pad byte Current thread 0x00007f226cda56c0 (most recent call first): File "bug.py", line 2 in <module> Aborted (core dumped) ------------------------ Tracemalloc adds "Memory block allocated at" traceback. By the way, I started to also suggest enabling tracemalloc when a ResourceWarning is logged, to see where the leaked resource was allocated. Victor
Victor> I wrote the feature (both tracemalloc and query tracemalloc when a Victor> buffer overflow is detected), so I should be able to help you ;-) Yes, I thought you might. :-) I've attached the output of a more complete run. The command is % PYTHONTRACEMALLOC=5 ./python ./Tools/scripts/run_tests.py -R 5:50:reflog.txt test_rattlesnake where test_rattlesnake.py has been cut down to a single unit test, which presumably is the one which exercises the problematic code. I don't get the error about writing off either end of the buffer unless I set the second arg pretty high (it succeeds at 40, fails at 45). I'll also quote one part of Tim's response: Tim> To my eyes, you left out the most important part ;-) A traceback Tim> showing who made the fatal free() call to begin with. Which is correct as far as that goes, but I hadn't yet given up all hope of figuring things out. I was more concerned with why I couldn't (and still can't) get tracemalloc to sing and dance. :-) I thought setting PYTHONTRACEMALLOC should provoke some useful output, but I was confused into thinking I was (am?) still missed something because it continued to produce this message: Enable tracemalloc to get the memory block allocation traceback which suggests to me tracemalloc still isn't enabled. That's emitted from Modules/_tracemalloc.c and seems to be properly protected: if (!_Py_tracemalloc_config.tracing) { PUTS(fd, "Enable tracemalloc to get the memory block " "allocation traceback\n\n"); return; } so I think there is still more to do. I was worried enough that I might have misspelled the environment variable name that I ran again after copying it from the documentation. No such Doh! moment for me, though I suspect it might be coming. :-/ FWIW, the register branch of my CPython fork: https://github.com/smontanaro/cpython/tree/register is what I'm working with at the moment. Current master has been merged to it. It should be exactly what is failing for me. (Note that I'm not asking for help there, just pointing out for the curious where my busted code is.) Thanks for both of your responses. Skip
[Skip Montanaro <skip.montanaro@gmail.com>]
Ah, I overlooked that.
Agreed - for whatever reason, tracemalloc is not enabled for you at the time your run dies, despite your
% PYTHONTRACEMALLOC=5 ./python ./Tools/scripts/run_tests.py -R 5:50:reflog.txt test_rattlesnake
command line. Have to leave that mystery for Victor! So between the "obvious" possibilities: - something is going nuts spraying 0 all over memory - the pointer passed to free() wasn't actually obtained from a malloc/realloc function there's nothing in the output to favor one over the other. Except that dereferencing addresses near the pointer didn't blow up (null bytes were returned), so the pointer is at least in the process's address space.
Le dim. 5 avr. 2020 à 04:02, Skip Montanaro <skip.montanaro@gmail.com> a écrit :
gdb traceback: (gdb) where (...) #2 0x000000000056671d in fatal_error_exit (status=-1) at Python/pylifecycle.c:2183 (...) #8 0x000000000047d61e in PyObject_Free (ptr=0x7d3e40 <_Py_FalseStruct>) at Objects/obmalloc.c:709 #9 0x0000000000498a9b in object_dealloc (self=False) at Objects/typeobject.c:3824 #10 0x000000000047c5aa in _Py_Dealloc (op=False) at Objects/object.c:2206 #11 0x00000000005f771c in _Py_DECREF (filename=0x741e7d "./Modules/_io/iobase.c", lineno=275, op=False) at ./Include/object.h:425 #12 0x00000000005f7f16 in iobase_finalize (self=<_io.TextIOWrapper at remote 0x7fffeab2a6e0>) at ./Modules/_io/iobase.c:275 #13 0x0000000000477755 in PyObject_CallFinalizer (self=<_io.TextIOWrapper at remote 0x7fffeab2a6e0>) at Objects/object.c:193 #14 0x00000000004777ea in PyObject_CallFinalizerFromDealloc (self=<_io.TextIOWrapper at remote 0x7fffeab2a6e0>) at Objects/object.c:211 #15 0x00000000005f7fd8 in _PyIOBase_finalize (self=<_io.TextIOWrapper at remote 0x7fffeab2a6e0>) at ./Modules/_io/iobase.c:319 #16 0x0000000000609533 in textiowrapper_dealloc (self=0x7fffeab2a6e0) at ./Modules/_io/textio.c:1433 (...) #48 0x0000000000595fa7 in delete_garbage (tstate=0x82dc80, gcstate=0x82c648, collectable=0x7fffffffd2c0, old=0x82c690) at Modules/gcmodule.c:1005 #49 0x00000000005965c8 in collect (tstate=0x82dc80, generation=2, n_collected=0x0, n_uncollectable=0x0, nofail=1) at Modules/gcmodule.c:1275 #50 0x00000000005980e7 in _PyGC_CollectNoFail () at Modules/gcmodule.c:2099 #51 0x0000000000546da8 in _PyImport_Cleanup (tstate=0x82dc80) at Python/import.c:633 #52 0x0000000000564f89 in Py_FinalizeEx () at Python/pylifecycle.c:1416 #53 0x000000000041ed1b in Py_RunMain () at Modules/main.c:634 #54 0x000000000041ed90 in pymain_main (args=0x7fffffffd500) at Modules/main.c:662 #55 0x000000000041ee0a in Py_BytesMain (argc=6, argv=0x7fffffffd628) at Modules/main.c:686 #56 0x000000000041d786 in main (argc=6, argv=0x7fffffffd628) at ./Programs/python.c:16 (gdb) p *res $6 = { ob_refcnt = 0, ob_type = 0x7d3ca0 <PyBool_Type> } (gdb) p *(PyLongObject*)res $7 = { ob_base = { ob_base = { ob_refcnt = 0, ob_type = 0x7d3ca0 <PyBool_Type> }, ob_size = 0 }, ob_digit = {0} } (gdb) p res == ((PyObject *) &_Py_FalseStruct) $8 = 1 Py_FinalizeEx() cleans up all modules with _PyImport_Cleanup(). When Python destroys a _io.TextIOWrapper instance, Py_False is destroyed... Py_False must never me deallocated: it's statically allocated. Py_False reference counter should not decrease to 0. It seems like your Python changes use Py_False "somewhere" without Py_INCREF(Py_False). -- It seems like you are working on a register-based bytecode, cool. I recall that you told me something like that ;-) It seems like executing the register-based bytecode of the following function is enough to steal one reference to False (use False without INCREF): def _long_block(s, b): if s > b: return s return b Maybe it's COMPARE_OP_REG() which calls SETLOCAL(dst, False). Maybe it's JUMP_IF_FALSE_REG() which has suspicious code: DECREF without clearing GETLOCAL(src). PyObject *cond = GETLOCAL(src); .. Py_DECREF(cond); Maybe SETLOCAL(src, NULL); is need somewhere. Maybe it's something else. Replacing stack-based bytecode with register-based bytecode requires to rethink the lifetime of registers... I had a hard time to fix my old "registervm" project to fix the register lifetime: I added CLEAR_REG bytecode to explicitly clear a register. Using a stack, all "CLEAR_REG" operation are implicit. You have to make them explicit. Hopefully, a compiler can easily reuse registers and remove most CLEAR_REG. -- My old project (2012, OMG, I'm getting old!): https://faster-cpython.readthedocs.io/registervm.html https://hg.python.org/sandbox/registervm/file/tip/REGISTERVM.txt Victor -- Night gathers, and now my watch begins. It shall not end until my death.
It seems like your Python changes use Py_False "somewhere" without Py_INCREF(Py_False).
Maybe it's COMPARE_OP_REG() which calls SETLOCAL(dst, False).
Yes, this was the problem. Thanks for the fix. Too much blind adherence on my part to the existing COMPARE_OP logic. I've even written (relatively speaking) tomes about it in both my in-progress PEP as well as in various comments throughout the code. I don't think I had all that sorted out in my mind before implementing the first few instructions. Fortunately, I'm not too far into implementing the actual instructions. I should be able to easily go back and desk check the others.
I'm trying it the simplest way I can think of. Registers are exactly like local variables, so SETLOCAL Py_XDECREFs whatever is already there before overwriting it with a new value. At the end of _PyEval_EvalFrameDefault if the code object's co_flags includes the (new) CO_REGISTER flag, it loops over the stack/register space calling Py_CLEAR. The stack/register space is also Py_CLEAR'd when the frame is first allocated. Skip
For posterity, just recording best guesses for the other mysteries left hanging: - PYTHONTRACEMALLOC didn't work for you because Victor's traceback showed that Py_FinalizeEx was executing _PyImport_Fini,, one statement _after_ it disabled tracemalloc via _PyTraceMalloc_Fini. - The address passed to free() was for the struct representing the False singleton, which is static Since it wasn't obtained from a malloc/realloc call to begin with, the free() debug code didn't find anything it expected to find. As noted earlier, if tracemalloc had still been active, it wouldn't have found that address in its database, and so would have produced no output at all. So - that was easy ;-)
Le dim. 5 avr. 2020 à 23:07, Skip Montanaro <skip.montanaro@gmail.com> a écrit :
Let's say that you call "func()". The stack-based bytecode "LOAD_GLOAL 0; CALL_FUNC 0" becomes register-based bytecode "LOAD_GLOBAL_REG %r1, 0; CALL_FUNC_REG %r2, %r1": the function result is stored into %r2 register. The problem is that if the %r2 is not cleared explicitly, you change the object lifetime and so the Python semantics. The register %r2 must be cleared explicitly. Same issue applies for %r1 (which stores a global variable). The correct code should be "LOAD_GLOBAL_REG %r1, 0; CALL_FUNC_REG %r2, %r1; CLEAR_REG %r1; CLEAR_REG %r2". A compiler can optimize it to: "LOAD_GLOBAL_REG %r1, 0; CALL_FUNC_REG %r1, %r1; CLEAR_REG %r1" (remove %r2: reuse %r1). For example, I expect to get immediately a ResourceWarning when running the following script demo.py with python3 -Wd demo.py: --- import time def func(): open("/etc/issue") func() # ResourceWarning expected here time.sleep(60) --- Victor -- Night gathers, and now my watch begins. It shall not end until my death.
[Skip Montanaro <skip.montanaro@gmail.com>]
I've got a memory issue in my modified Python interpreter I'm trying to debug. Output at the end of the problematic unit test looks like this:
To my eyes, you left out the most important part ;-) A traceback showing who made the fatal free() call to begin with. In debug mode, allocations are padded on both ends with various stuff: FORBIDDENBYTEs (0xfd), a count of the number of bytes originally requested, and a serial number incremented by 1 each time alloc/realloc is called. The memory you requested is entirely filled with CLEANBYTEs ()xcd). On deallocation, that stuff is checked (that's where your run is dying), and the memory is filled with DEADBYTEs (0xdd) (a weak but nevertheless often effective way to detect "double free"s). Now in your case, absolutely every byte shown is 0x00. There are no useful clues at all remaining. It's not just a short buffer overrun, _everything_ has been NULLed out, including before the requested memory. So something has gone nuts spraying 0 all over memory ... or the pointer passed to free() was nonsense to begin with. If it were a case of double free, we'd usually expect to see some DEADBYTEs in the debug output. But there weren't any. However, you did not get any output from tracemalloc, despite that it sounds like you set up the right magic to activate it. Which leans toward the "complete nonsense" hypothesis. If tracemalloc doesn't find the address in its dict of currently-allocated addresses, looks like it fails silently, simply not producing any output. Which is what you're experiencing. So where was free() called and what was passed to it? All the evidence is consistent with the idea that what was passed to free() was never obtained from a malloc or reallloc call to begin with.
participants (3)
-
Skip Montanaro
-
Tim Peters
-
Victor Stinner