Extracting python bytecode from a linux core dump?
I hope this is the proper place for internals questions... I have a core file (produced via the gcore command) of a linux python2.6 process. I need to extract the byte code and de-compile it. I looked at https://wiki.python.org/moin/DebuggingWithGdb and related pages. However, these all seem to require either a running process and/or a binary with debugging symbols. I'm thinking that the compiled bytecode is likely in an array or contiguous set of memory within the python executable's image and that there's probably a way to pull it out with gdb. Unsurprisingly, the pyc 0xd1f20d0a magic number isn't kept in memory. So, how do I find the memory holding the compiled byte-code ?
I can’t help with the gdb commands, but I’d suggest you want to start from one of the global variables for the sys module, probably the modules dict. You’ll have to reverse engineer the memory structures to find its values and each of their dicts, eventually finding function objects pointing to code objects which will point to bytecode blobs. All of these structures are in the sources, so it shouldn’t be that hard, just time consuming (I’ve done it on Windows before with different tools). If you know the code was being executed when the dump was made you could look at the stack to find calls in the EvalFrame function. Those should have a local or argument to the code object or bytecode (my memory on names and structures in 2.6 isn’t that good). A final alternative would be to find the function type object address and search memory for that to locate function objects and then code objects. That might be the best one, if you can locate the type object in the dump. Hope that helps, Steve Top-posted from my Windows phone From: CompuTinker Sent: Wednesday, June 7, 2017 13:26 To: python-dev@python.org Subject: [Python-Dev] Extracting python bytecode from a linux core dump? I hope this is the proper place for internals questions... I have a core file (produced via the gcore command) of a linux python2.6 process. I need to extract the byte code and de-compile it. I looked at https://wiki.python.org/moin/DebuggingWithGdb and related pages. However, these all seem to require either a running process and/or a binary with debugging symbols. I'm thinking that the compiled bytecode is likely in an array or contiguous set of memory within the python executable's image and that there's probably a way to pull it out with gdb. Unsurprisingly, the pyc 0xd1f20d0a magic number isn't kept in memory. So, how do I find the memory holding the compiled byte-code ? _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org
I have a core file (produced via the gcore command) of a linux python2.6 process. I need to extract the byte code and de-compile it.
Following on Steve's comment, you might want to take a look at Misc/gdbinit for some GDB command inspiration. You are correct, you won't have a running process, but I think you should be able source that file (maybe with tweaks, depending on the Python version you are debugging), then move up and down the C call stack, poke around in the C locals, then follow pointers to the currently active functions, then for those which are Python functions, follow the func_code attribute to get the code object. I can't remember what the actual bytecode attribute is called in the code object. (It's been too many years.)
However, these all seem to require either a running process and/or a binary with debugging symbols.
Yeah, you're going to have *a lot* of fun with a stripped executable. If you're debugging a core file from an interpreter compiled with much in the way of compiler optimization, many of the local variables will have been optimized out. You'll likely be stuck rummaging around until you figure out the pattern of where the compiler put things (register-wise).
I'm thinking that the compiled bytecode is likely in an array or contiguous set of memory within the python executable's image and that there's probably a way to pull it out with gdb. Unsurprisingly, the pyc 0xd1f20d0a magic number isn't kept in memory. So, how do I find the memory holding the compiled byte-code ?
Correct. The module level bytecode is executed once at import time, then discarded, at least that used to be how it was done. Skip
participants (3)
-
CompuTinker
-
Skip Montanaro
-
Steve Dower