[pypy-dev] cpyext: Detecting pypy and other issues

Armin Rigo arigo at tunes.org
Sat May 14 14:39:21 CEST 2011


Hi Roger,

On Fri, May 13, 2011 at 7:47 AM, Roger Binns <rogerb at rogerbinns.com> wrote:
>>> Are you using callgrind or using valgrind for memory checks? The
>>> former should work, the latter is rather pointless I think, because
>>> RPython manages memory on it's own.
>
> The point is to use valgrind.

I think the general problem is that you are trying to approach
debugging PyPy like you approach debugging a C-written program (say,
CPython).  You come with a number of expectations of being able to do
certain things, which don't work well or at all in this case.  It is a
bit like, say, wanting to debug various memory issues in a Java
program by running the Java VM in gdb.  It may eventually work, but
it's a *lot* of pain, and you need to be very familiar with how the VM
works internally.

I know that gdb is essential to make CPython C extensions work.  I
believe that it's also a limitation of cpyext the fact that it makes
gdb'ing through the PyPy-compiled C extensions so painful.  One thing
we could add to cpyext is a reference count checker: a special mode in
which cpyext doesn't immediately free the CPython objects whose
refcount drops to zero, but instead checks for a while if the object
is still used, and cleanly complain if it is.  This can be added to
the RPython source code of cpyext; it's not something that should be
debugged in gdb, for the reasons above.

If nevertheless you want to use C-level debugging tools, then it's
going to be painful:

> - - How to do a debug build that retains all symbols and line number information

That part is easy: you need to go in the C sources produced in
/tmp/usession-*/testing_1, and type "make lldebug".  Maybe you want to
disable "obmalloc.c", as you'd do in CPython; I think it's some other
predefined target in the Makefile.

> - - Some gdb macros to make debugging easier.  For example CPython comes with
> a .gdbinit with nice macros like 'pyo' so you can see what a PyObject *
> represents

This is impossible in general, because the C structures corresponding
to objects changes depending on e.g. whether you included some special
compilation options or not.  It would be possible, though, to come
with some macros that work in the common case.  This might require
additional hacks to fix the C name of some helper functions, so that
they can be called by the macros.  In other words it's something that
someone may come up with at some point (there was some work in that
direction), but it's work.

> - - How to disable all memory optimisations and make it work with valgrind

That's not possible.  The minimark GC is not a "memory optimisation";
it's a completely different approach to garbage collection than, say,
CPython's reference counting.

> - - How to get deterministic behaviour - ie exactly the same thing happens
> each time whether you run gdb or not

I don't know the details, but there are several factors that make it
impossible.  The most important one is, again, the fact that the
minimark GC triggers collections at times that look random.  I don't
see how to fix that.

> Trying the refcount gc was to address the last two since it should result in
> objects being freed the moment they are not used and not being dependent on
> when GC is run or what address spaces it is run over.

The refcount GC included in PyPy is not suited to run a complete
pypy-c.  It's there for historical reasons, and it's still used by a
number of tests, but it doesn't support weakrefs for example, nor the
GCREF type, which is essential for the JIT.  I fear there is no more
way to get a "completely deterministic" behavior in gdb for pypy-c
than there is for a Java VM, for the same reasons.  What we are
missing is a set of tools that let you locate and debug issues with
CPython C extension modules, but not necessarily based on gdb.


A bientôt,

Armin.


More information about the pypy-dev mailing list