[pypy-dev] cpyext: Detecting pypy and other issues

Roger Binns rogerb at rogerbinns.com
Sat May 14 16:24:00 CEST 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/14/2011 05:39 AM, Armin Rigo wrote:
> I think the general problem is that you are trying to approach
> debugging PyPy like you approach debugging a C-written program (say,
> CPython).

Note that I am not trying to debug pypy itself but an extension written in C
compiled with cpyext.

> It is a
> bit like, say, wanting to debug various memory issues in a Java
> program by running the Java VM in gdb.  

It is like trying to debug JNI issues, not Java program issues.  To debug
JNI you do use a regular C based debugger like gdb.  Some JVM also have a
checking mode for JNI calls:

 http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index.jsp?topic=/com.ibm.java.doc.diagnostics.50/html/jni_debug.html

> One thing
> we could add to cpyext is a reference count checker: a special mode in
> which cpyext doesn't immediately free the CPython objects whose
> refcount drops to zero, but instead checks for a while if the object
> is still used, and cleanly complain if it is.  This can be added to
> the RPython source code of cpyext; it's not something that should be
> debugged in gdb, for the reasons above.

I'm all for a checking mode.  Note that you'll still have to provide some
way of interacting the with the debugger - it would be rather pointless to
emit a message saying "pyobject misuse at address 0x12345678" and then exit
as it would need to be established who allocated it and who reused it after
death.

> If nevertheless you want to use C-level debugging tools, then it's
> going to be painful:

*Any* debugging tools would be fine, but given C extensions are full of C
code you may as well make using C tools on that C code possible.

>> - - Some gdb macros to make debugging easier.  For example CPython comes with
>> a .gdbinit with nice macros like 'pyo' so you can see what a PyObject *
>> represents
> 
> This is impossible in general, because the C structures corresponding
> to objects changes depending on e.g. whether you included some special
> compilation options or not.  It would be possible, though, to come
> with some macros that work in the common case.  This might require
> additional hacks to fix the C name of some helper functions, so that
> they can be called by the macros.  In other words it's something that
> someone may come up with at some point (there was some work in that
> direction), but it's work.

The exact same issues apply to CPython.  CPython exports a helper function
named _PyObject_Dump(PyObject *) which then calls the internal Python
machinery (str etc).  'pyo' is the only macro I use and is very necessary
because you have PyObject* all over the place and need some idea of what
they are.

>> - - How to disable all memory optimisations and make it work with valgrind
> 
> That's not possible.  The minimark GC is not a "memory optimisation";
> it's a completely different approach to garbage collection than, say,
> CPython's reference counting.

It is actually possible.  All valgrind needs to know is which areas of
memory are in use and which aren't.  By far the easiest way is to devolve
into C library calls of malloc and free.  That is why I mentioned the ref
counting GC as it would be malloc and free calls at the end of the day.

Since pypy doesn't have a functioning "dumb" memory mode the only
alternative is to add calls to valgrind.  It has a header you can use that
generates no side effect instruction sequences when the program is run
normally and tells valgrind things when run under valgrind.  See
http://valgrind.org/docs/manual/manual-core-adv.html

For example the memory allocation code would need to call
VALGRIND_MALLOCLIKE_BLOCK for each allocated chunk of memory and the GC
would need to call VALGRIND_FREELIKE_BLOCK on each freed chunk.

>> - - How to get deterministic behaviour - ie exactly the same thing happens
>> each time whether you run gdb or not
> 
> I don't know the details, but there are several factors that make it
> impossible.  The most important one is, again, the fact that the
> minimark GC triggers collections at times that look random.  I don't
> see how to fix that.

I'd be very happy having the GC be triggered at every possible point and do
a complete collection.  That is the best for a checking mode since it means
the time between when something is no longer used and when it is collected
will be as short as possible.

You can probably trigger it before and after every cpyext wrapped call.

> What we are
> missing is a set of tools that let you locate and debug issues with
> CPython C extension modules, but not necessarily based on gdb.

Indeed.  At the moment my code works perfectly under CPython, has 99.6% test
coverage, is valgrind clean under CPython and only broke the refcount rules
for one internal debug method used for fault injection that is only used in
special builds.  The latter was fairly easily diagnosed and fixed.  Under
pypy I'm having all sorts of problems including a pypy crash, nonsensical
behaviour and just using gdb changing program behaviour.  Until pypy is
perfect I won't be the only one affected by this!

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk3OkHgACgkQmOOfHg372QQlBgCghjQw1kwzd9KLc1XXNlISPnkk
ZY0An0SkMpm3vS7OfknkGtYRJrTvXCgf
=y4Oe
-----END PGP SIGNATURE-----


More information about the pypy-dev mailing list