[issue8380] Port of the gdb7 debugging hooks to the "py3k" branch

Wed Apr 21 23:29:11 CEST 2010

Dave Malcolm <dmalcolm at redhat.com> added the comment:

I'm attaching a new version of the patch, for the py3k branch.

I changed my mind back about the breakpoint, using "id" and "builtin_id" as in my original patch.  I prefer it since it has a single argument, which makes it very convenient to work with in the various tests - textiowrapper_write takes an args tuple, which makes things like corrupting the pointer slightly more tricky.

The big change here is that I've changed the output format throughout to try to emulate Python 3 literals: a PyLongObject instance is now printed as digits, without a trailing "L".  I feel that the fact that gdb is running python 2 is really just an implementation detail, and that the pretty-printer ought to print in a format reflecting the language being debugged.

This also removes the 'u' prefix from strings, and I've added tests for 'bytes' (which get a "b" prefix).  I've also (I believe) correctly implemented the Python 3's literal representation for empty and non-empty sets and frozensets ( e.g. "{1, 2, 3}", as opposed to Python 2's "set([1, 2, 3])" )

More controversially, a PyUnicodeObject instance is printed using an emulation of Python 3's unicode_repr algorithm, which means that gdb prints unicode to sys.stdout, so that gdb will potentially print non-ASCII characters, using the encoding of sys.stdout.  This will only work if gdb's encoding is set to something that can cope with said characters:

Python 3.2a0 (py3k:80312M, Apr 21 2010, 17:00:02) 
[GCC 4.4.3 20100127 (Red Hat 4.4.3-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> id('文字化け')

Breakpoint 1, builtin_id (self=<module at remote 0x7ffff7fd7df8>, v='文字化け') at Python/bltinmodule.c:912
912		return PyLong_FromVoidPtr(v);

Note the unicode characters in the rendering of "v" in the breakpoint.

I suspect that this is a change too far (for example, I'm assuming a UTF-8 locale).

Any suggestions on what the output should look like for the unicode case?  

Would it be better if I coerce everything back to an escaped literal syntax that's encodable as ASCII?  That would probably avoid encoding and locale issues, but lose immediate readability for people able to read non-ASCII scripts.

All tests pass with both UCS2 and UCS4 builds on this Fedora 12 x86_64 box, building with --with-pydebug in both cases.

----------
Added file: http://bugs.python.org/file17031/port-gdb7-hooks-to-py3k-002.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8380>
_______________________________________