[IPython-dev] Splitting input by AST nodes

Sun Apr 3 02:55:31 EDT 2011

Howdy,

On Sat, Apr 2, 2011 at 3:02 PM, Thomas Kluyver <takowl at gmail.com> wrote:
> Our current methods of splitting input into 'blocks' is...well, one of the
> comments starts "HACK!!!". It's also the cause of at least issue 306*, and
> trying to change it is likely to lead us into a minefield of other similar
> problems.
>
> I propose that we redo this with a system making use of the AST (abstract
> syntax tree) capabilities introduced in Python 2.6. Using the ast module, we
> can parse a code string into an AST, and then use the nodes from that
> instead of code blocks. AST nodes can be compiled into standard Python code
> objects to be run with exec. This has a couple of consequences which I've
> thought of so far: firstly, there's no easy way to get back the code as a
> string from an ast node (third party modules exist, but I get the impression
> they're rather hackish themselves). So we could no longer do things like "if
> the last node is two lines or less, execute it differently". Secondly, "a =
> 11; b = 12" is currently understood as one block, but in the AST it would be
> two nodes.
>
> Since this would obviously be a fairly major change to a key part of
> IPython, I thought I'd get some feedback before I start trying to implement
> it. Does using AST nodes instead of blocks sound useful, or is there a good
> reason why it wouldn't work?

This is a great idea in principle, but as Brian points out, the issue
is the extended ipython syntax.  As you can see in inputsplitter, the
split_blocks method works by calling the .push() method, and the
IPythonInputSplitter subclass overrides this method to extend what is
considered valid syntax, by transforming things out.

Now, this doesn't mean the job can't be done: the good thing here is
that we have a very solid test suite for this code, and test coverage
is excellent:

(master)dreamweaver[tests]> nosetests --with-coverage
--cover-package=IPython.core.inputsplitter -vvs test_inputsplitter.py

[...]

Name                         Stmts   Exec  Cover   Missing
----------------------------------------------------------
IPython.core.inputsplitter     296    294    99%   520-521
----------------------------------------------------------------------
Ran 83 tests in 0.104s

OK

In fact, before doing anything I'd *strongly* recommend getting that
back to 100% percent by adding a test that exercises lines 520-521.
You wouldn't believe how many times I found bugs by propping up test
coverage back from 98% or 99% to 100% when I wrote that stuff...  And
those lines are in the heart of split_blocks, so it would be best to
start with full coverage before breaking anything.

Now, the job *can* be done; the trick will be to decouple all the
syntactic transformations from the block analysis.  And keep in mind
that .push(), or at least push_accepts_more(), is a key part of the
public API, because line-oriented frontends need it (or something like
it) to decide when to stop accepting input and starting execution.

But while it's true that this code is critical, it's also so well
tested that I'm not terribly concerned about you trying to improve it.
 If you can keep that test suite passing first (and later obviously
the whole system-wide one), we should be in good shape.

Before you start though, in addition to 100% coverage, have a look at
the test suite and see if you can think of any important cases we
might have missed.  I tried to cover all the bases, and over time
Robert has also added more tests, but spend a bit of time thinking if
we missed something.  If you make that test suite really good,
ultimately that's all we care about.  The internal implementation is
really a detail, and even if you need to juggle around the api a
little bit it's no big deal (you can fix the calling code as needed).
Those tests are our 'contract' on what we do syntax-wise and
block-splitting-wise, so as long as you can still pass it, how you get
there is up to you :)

And getting rid of that double-pass hack would be very good indeed.

Needless to say, do this on a branch ;)

Good luck, and keep us posted!

f