
On 09/08/2012, Tim Henderson <tim.tadh@gmail.com> wrote:
Fortunately, I don't need a complete call graph. To get started I really only need procedure level dependency graphs (with control and data-dependencies). This graph is essentially a CFG with extra edges for different types of data dependencies. Thanks to Alex's previous reply I am taking a look at the CPython bytecode compiler to see if I can extract the local CFG's it produces.
Although pypy's own compiler creates a block structure and allows you to do post-order traversal of the blocks, I've found the easiest way to get the CFG of a single code object to be to use the logic found within the dis module. Here's an example of some old code that does that, somewhat stripped down and provided without tests I'm afraid because I kind of got bored of it. It's slightly more involved than dis.findlabels, part of a large piece of nonsense for deriving dataflow equations from python at app-level. https://gist.github.com/3300866 Functions OP_FN are not hard to specify but I've got them intermingled with other, much more complicated code here, they use the EffectContext to store the residual behaviour. Stack depth is completely statically determinable, and in 2.5 or later, it's determinable without looking at other code objects. I think other people have managed to do this, by the way - I have the vague recollection that Prof. Yanhong Liu at Stony Brook wrote a tool to extract datalog facts from python, although with hand-coded flow for built-ins; if you're only interested in intraprocedural analysis (often next-to-useless in python) I guess that would be fine. But I can't seem to find that, now. -- William Leslie