[Cython] line tracing/profiling code objects

Stefan Behnel stefan_ml at behnel.de
Thu Feb 27 22:06:38 CET 2014


Hi!

Syam Gadde, 27.02.2014 16:22:
> I tried using line tracing in conjunction with line_profiler/kernprof to 
> attempt to see if line-based profiling works in Cython.  I think it's 
> pretty close.  But I think the problem is that line_profiler is getting 
> a different PyCodeObject when it wraps the function and when it actually 
> gets the trace call.  It adds the initial code object to a map, and 
> later when it gets the trace call, decides that the trace call is not 
> something it needs to pay attention to because the new code object that 
> it gets is not the same as the original one.
> 
> The first code object is created at function declaration by 
> __Pyx_PyCode_New (called in__Pyx_InitCachedConstants) and is assigned to 
> a variable __pyx_k_codeobj_NNN.  The second code object is created, 
> essentially, during the first entry into the function (in 
> __Pyx_TraceCall, via __Pyx_TraceSetupAndCall).  It seems that setting 
> __pyx_frame_code to the initial code object before calling TraceCall() 
> would fix this.

That's a part of it, yes. Here's another bit in the puzzle:

https://github.com/cython/cython/pull/93

The problem is that the code objects that we currently create along the way
have a fixed Python code line number (because that was the simplest way to
get it working). The two changes above will get us pretty far. What remains
to be done then is to enable line number calculation at runtime by
emulating byte code offsets in the frame object instead of using absolute
line numbers. The pull request refers to CPython's
Objects/lnotab_notes.txt, which has some details on the inner workings.

Properly calculating line numbers at runtime would also have other nice
side effects, because it would remove the need to create multiple code
objects for a single function in the first place. So it's very desirable.


> Is this easy to do?  I'd do it myself, but I'd need help figuring out 
> how to get the name of the appropriate __pyx_k_codeobj_NNN variable from 
> within FuncDefNode.generate_function_definitions(), which calls 
> put_trace_call().

The best way is usually to run it through a debugger and see what you can
get your hands on. :) The constant names (__pyx_k_...) are generated on the
fly at C code generation time, so you can't get them before that. Two
possible solutions: either change the way how the C names of code objects
are being generated (e.g. by making their C name depend on the mangled C
name of the function so that they can be deterministically named at
analysis time), or make the code object node remember its generated
constant name and reuse it in other places.

As you can see, it's not entirely trivial overall, but we've already been
piling up the bits and pieces so that the only remaining effort is to put
everything together. If you could give it a try, I'm sure you'd make a lot
of people happy with it.

Stefan



More information about the cython-devel mailing list