[IPython-dev] Code review (mostly for Fernando)

Wed Sep 15 01:59:09 EDT 2010

Hey,

[ for the list - Brian just did a very thorough review of a ton of
recent code I wrote while he was traveling.  Many thanks go to him for
the careful and time-consuming job, I'm replying on list further so
that we have a public record of whatever design decisions arise from
this, since some of the code touches on fairly core parts of a design
we're trying to stabilize for the long haul].

On Mon, Sep 13, 2010 at 22:22, Brian Granger <ellisonbg at gmail.com> wrote:
> Fernando,
>
> I went through all of your commits over the last few weeks and did a
> code review.  I focused on stuff in the kernel, but I did look over
> almost all of the commits.  Overall the code looks great, fantastic
> work!
>
> Here are my comments:
>
> http://github.com/ipython/ipython/commit/60bf09757856f7769eb79c2581a86656de4d275a
>
> Does the kernel know anything about cell mode?  If so, we need to go over this
> more carefully. Added later: I now see that the kernel does know about cell
> mode. We should talk more about this.

Yes, what we need to do is rationalize the number of 'run*' entry
points we have.  Those have accreted over time, starting with the old,
convoluted and unnecessary approach that we inherited from code.py
(push()->runsource()->runcode()).

I added run_cell separately just as a new, safe place to see if we
could get the right semantics in place.  I'm very happy with the
user-facing behavior we get, but we should all agree on whether my
happiness is justified, and separately, with what to do with the API
to keep the good behavior but with a less cahotic underpinning.

The behavior I implemented (the one I like) is that run_cell() takes a
multiline string and does:

- if it's a single statement, compiles it and runs it in 'single' mode
(tiggering sys.displayhook).

- if it contains multiple blocks, it decides:
  a. If the last one looks like a very compact statement (one or two
lines of source): all but the last are joined and executed in 'exec'
mode (no sys.displayhook) and the last is executed in 'single' mode.
  b. Otherwise, the entire cell goes in 'exec' mode, much like a script.

In practice, I really like this behavior: we get the convenience of
Out[] results for either single statements or the last thing in a
cell, which is nice, without the clutter of multiple Out[] values
coming out of a single cell.

The basic idea is: one 'cell' of input will give at most one Out[]
value, and it will correspond to the last statement in the cell.

Having used it, I think it's a good human interface.

The first question is then: do we agree on this as the behavior we
want going forward?  (independent of how it's done under the hood).
My vote is obviously yes, but I'm totally open to feedback, of course.

Now, on to the important issue of our weedy garden of run*() calls...
We currently have:

newkernel)uqbar[core]> grep 'def run' interactiveshell.py
    def run_cell(self, cell):
    def runlines(self, lines, clean=False):
    def runsource(self, source, filename='<input>', symbol='single'):
    def runcode(self, code_obj, post_execute=True):

Let's think a little about how many methods and with what semantics we
want, though obviously we need to be careful in the transition period
not to break the existing code. We can use as a marker that the new
api uses run_ names (with underscore) and we'll eventually
deprecate/remove the run* ones (no underscore) we inherited from
code.py (and to which we added runlines).

Here's a first pass at the problem.  We must keep in mind that we want
an API that's suitable for non-python clients, that are incapable of
using inputsplitter to do syntax transformations, so they can only
feed us raw 'ipython language' source (e.g. a web client).

1. execute_code(pycode, mode='exec')  # pretty much what run_code is today
  - no history management
  - pycode is either a compiled code object or python source code.  If
it doesn't just compile(), a syntax error is raised and that's it; no
source transformations are applied.
  - since this is just a wrapper around the exec() call, I think we
should rename it to execute_code()
  - if pycode is compiled code, the mode flag is ignored.  If it's
source, it determines the compilation mode (see below).

2. run_cell(ipycode):
  - transforms runs input through syntax transformer
  - manages history
  - makes actual call to execute_code()
  - exposes the semantics for single/exec modes I outlined above, but
does so by simply calling execute_mode() with the necessary mode
flags.

I'm using the loose naming convention: 'execute -> low-level,
basically a wrapper around exec()' and 'run -> higher level, manages
history, extended syntax, etc'.  I'm not crazy about the fact that our
kernel api currently uses execute() as its entry point for all
executions, but perhaps that's OK.  The messaging spec doesn't need to
match every detail of the functional API.  Dissent welcome, though.

Thoughts?

> http://github.com/ipython/ipython/commit/c3bafd129ec29601280dcab4a7bd9008ce35dd15
>
> Last summer we decided to move away from having separate files for each platform
> for modules in utils. The code in _process_* is pretty short, any reason to
> not simply put it all in a single file like the other modules in utils?

I honestly don't remember the decision from last summer, sorry :)
Could you remind me of the reasoning, if you recall?  I guess I'm a
little allergic to

if sys.platform...:
  def f()

else:
  def f()

styles...  But I agree that the diamond diagram we have with four
files (_process_common -> (_process_win, _process_posix) -> process)
isn't exactly pretty either, so I'm open to reconsidering.

We can change it back once things settle down with what Evan is doing
and I run the full tests with pexpect on win32: at that point we'll
know if the _win code needs to grow a lot.  Because once it gets big
enough, I do think the separate files are better than the if/else
approach.  So how about this: let's wait a little, and if once the
dust settles the actual files remain small enough, we'll merge them
back in.

> http://github.com/ipython/ipython/commit/566d18a2fa3f7c79c68c0597f1e4d13c34b580dd
> http://github.com/ipython/ipython/commit/380ef8fffb3cac803d291d64359bd34a1efc5935
> http://github.com/ipython/ipython/commit/94057a1f613b31d4cdec7cea2db485577581916e
>
> Beautiful feature!

Thanks :)

> We should talk about the post_execution stuff.  See the problem it is solving
> in this case, but not sure about the solution.

Oh, I'm very much not sure about it either.  This is probably the
major thing I added that I knew we'd need to revisit; at first I just
wanted to see if we could pull it off at all.

My thinking is actually moving closer and closer to using context
managers.  We basically offer:

- pre-execution hooks
- post-execution stuff (in the ugly/temporary approach I used above)
- custom exception handlers.

That smells *a lot* like a context manager wrapping our exec calls.
But I'm a little afraid of jumping fully into context managers for
that (the semantics of nested managers, when dealing with exceptions
are highly non-ideal in python2.6, though quite a bit improved in
2.7).

I ran the idea by Min at lunch and he liked it, but it really needs
more thought.  On the one hand we're doing something *so close* to
context managers that it seems silly not to just use that.  On the
other hand I'm worried about the need for tighter exception control
than we get from the context manager protocol.

> When I first saw "paste", it was confused by its name. The word "paste" has
> a very generic meaning for all of us (as in cut/copy/paste) and I think
> something as specific as matplotlib inline figure insertion should have a more
> specific name...unless we envision the paste function becoming more generic.

Yes, paste is not ideal and I don't like it either.  I was thinking
'pastefig' earlier today: it's pretty explicit and matches the
sound/feel of 'savefig', already in mpl.  How does that sound?

> http://github.com/ipython/ipython/commit/888820b14f618dd6fee62c216aad4896694241f2
>
> I don't remember the details, but I thought that the zmqshell needed to have
> different logic in init_io, but I see that you have removed the init_io from
> zmqshell. Can you explain this to me?

Well, I just looked at the code and, as written, it was a near
duplicate of the parent.  The only difference is that it didn't use on
win32 the ANSI-enabled console, but that just adds an extra feature to
support ANSI escapes and is otherwise identical, so it didn't seem
necessary to have an overridden method.  I couldn't find a good reason
to keep it, functionally...

> http://github.com/ipython/ipython/commit/205d1241376996eb8ca5e72c7c29d4689a80fbb8
>
> Let's go over the execution model and assumptions in the kernel. I want to
> make sure that the behavior of the kernel in this respect is going to be
> stable moving forward. Most of the issue is that you have done a lot and I
> want to make sure I understand where we are at.

sure, I think I explained that one above better, and the code as its
stands now is far better documented:

http://github.com/ipython/ipython/blob/newkernel/IPython/core/interactiveshell.py#L2112

Though we'll still want to revisit all this, as per the discussion above.

> http://github.com/ipython/ipython/commit/3658d7b687325f6f16f9a0f2289f7bade51aee82
>
> This reminds me I need to add the connect_request handler to the msg_types
> in ipkernel.py.
>
> Let's rename "instant_death" to "now" everywhere as people are used to this
> word in this context (sudo shutdown -h now)

Sounds good, I'll do that in a quick commit now (no pun intended :).
I wasn't very happy with that name either, but couldn't think of a
better one.

> http://github.com/ipython/ipython/commit/0a7f662cac8f303c25ffc8ed8fe625f349c682bd
>
> Let's go over this.  I realize that the cell stuff is important, but I am a
> bit hesitant to teach the kernel about cells in a hackish (your own words ;-)
> way. I don't want to implement things now that cause API or behavior breakage
> when we remove the hacks.

This one is OK from an api/stability perspective: it's 100 internal,
and our API would remain the same moving forward.  What's ugly is the
double-pass, combining my static analysis with Robert's block one.
The real solution is to use the new AST module, which Robert couldn't
use because he wrote that code for 2.5 and ast.py is only in 2.6.

But I don't know the ast module/use well right now, so I figured this
ugly solution would have to do for now.  It's robust, 100% internal to
our own stuff, and once we have the time (or somebody helps us) it can
be replaced with a clean, AST-based one.

> The initialization of the input_splitter in init_instance_attrs should be
> moved to somewhere else. Basically, init_instance_attrs should eventually be
> retired and anything that is traited should not be in there.  You can have
> traits autocreate the input splitter by doing::
>
> input_splitter =
> Instance('IPython.core.inputsplitter.IPythonInputSplitter',(),{})

Ah, OK, cool.

> http://github.com/ipython/ipython/commit/e7e389c1052ceda15656f84edf8cff41f3640992
>
> I know this is not nice to say, but I think we should use a different data
> structure than a NamedTuple or dict here. Basically, we want the message
> protocol to use universal data structures. The closest thing to a NamedTuple
> is a list of tuples:
>
> data = [('type_name',type_name),('base_class',base_class), etc.]
>
> This will be both ordered and easy for both Python and Javascript to
> parse and print.  It will also be easy for Python frontends to put the data
> back into a NamedTuple if needed.  But the wire protocol should be universal.
> If we just send the dict, Javascript clients will loose the ordering info.

Totally agreed, no worries.  Right now only a python client could
reconstruct the order by importing the fields list, by switching to a
list-of-pairs we'll make it language-agnostic.

> http://github.com/ipython/ipython/commit/4e3685be1b53680b1894da6bba96141ed582b123
>
> Add blank line right before def _ofind(...)

yup :)

> http://github.com/ipython/ipython/commit/06dcbd4381870cd44c9b76e7a7be2c0431086264
>
> See above comment about separate _process_*.py files.

As above, let's wait on these until the dust settles, and we'll merge
them back if they stay small.

> http://github.com/ipython/ipython/commit/2be9133b2a3035f51932b3ebd9ac054a9b3e28ba
>
> Maybe rename get_user_variables to something that reflects we are getting
> the repr's of the variables?

We have these two that return reprs:

get_user_variables
eval_expressions

But in python we don't tend to add type names to function names (the
builtin is called sorted() despite the fact that it invariably returns
a list, not sorted_list()).  But I'm open to better name
suggestions...

> Shouldn't the prompt defaults in the IPythonWidget class just be in the
> traits definitions, not at the top of the file?  They are not that long...

No object, I just didn't want to move too  much of the Qt code around,
so I added what I needed next to what was there already.

> Around L257?  "FIXME - fperez: this should be a silent code request." Does
> this need fixing still?

Already fixed by Evan after I added the underlying support in the kernel.

> http://github.com/ipython/ipython/commit/2be9133b2a3035f51932b3ebd9ac054a9b3e28ba#L5R152
>
> Does this FIXME affect the msg spec? Same with other FIXMEs in this file.
> I just want to make sure that all of the FIXMEs in ipkernel are completely
> internal details that don't affect the msg spec or kernel behavior.

No, not the messaging spec.  It's just that we can't get rid of this
one until we stop using runlines() and disentangle our run/exec
routines.  But it's OK to leave it there, it's ugly but 100% internal
for now.  When we stop using runlines and exceptions propagate where
they should, then we can get rid of that hack.

Thanks again for the super review!  Given the mountain of code in
there, you did a tremendous job and I greatly appreciate it.

Let's try to knock out the big api ones, which are the run/exec design
questions above.   The rest is pretty easy stuff.

Cheers,

f