[IPython-dev] Some new cell types for describing data analyses in IPy. Notebook
sychan at lbl.gov
Fri Jul 5 20:09:17 EDT 2013
First off, thanks for posting your work. We are working
adapting/extending the IPython notebook for use as a
teaching/publication tool and one of the most important use cases is
to demonstrate alternatives/branches in a notebook that shows how
different options produce different results - documenting the
decisions that went into the final "path" through a notebook.
For the KBase project, we think the ability to show branches and
alternatives are an important feature that we need to implement, and
we would love to work with you further on this.
On Fri, Jul 5, 2013 at 12:28 PM, Gabriel Becker <gmbecker at ucdavis.edu> wrote:
> I am cognizant of the added complexity I am talking about. I disagree a bit with the
> size/damage being attributed to it, but the main place we seem to be on different pages is regarding what it buys us.
There is a broader question here: What constitutes "too complex"?
If you using a notebook as a teaching tool, or as a way to document
your analysis, being unable to show alternatives would be considered
I think the complexity here not conceptual (almost anyone doing any
kind of analysis should understand evaluating alternatives), but
rather the problem is in the user interface, and the problem of
providing an interface that works well for novice users and "read
only" users of a notebook, versus advanced users and authors who want
to express fairly reasonable ideas in their notebooks.
That being said, I agree with everything you say in the following section:
> The alt cells I'm talking about are not a cool rendering/interaction trick for linear notebooks. If they were I would absolutely agree that the nesting is overkill and not worth its cost.
> Linear/sequental notebooks and other dynamic document systems reproduce computational results. The documents my advisors (Duncan Temple Lang and Deborah Nolan) and I are working towards aim to describe and reproduce the research itself.
> I once asked a room full of quants to raise their hands if their standard operating procedure was to read in the data, maybe clean it a bit, fit a single model, write up the results and be done. Not only did no one raise their hand, but the mere suggestion that that could be how it worked got a substantial laugh from the audience. Even though this is not how the work is done, however, it is the narrative encoded into a linear notebook.
> Scientists and data analysts are already doing branching in their work, but they don't have any good tools to record or describe it that way. So they comment out large blocks of code in linear scripts, or they save old attempts and alternate approaches in separate files, or they (hopefully not) simply delete or overwrite old parameter configurations with new ones when they decide the old one wasn't right. Not because they think these are good ideas, but because its all they have available and they are scientists/analysts.
> Their job is to analyze data, extract insight and share their work in a useful manner; our job is to conceive and implement the tools they need to do that. They are good at their job, but we are struggling with ours.
> In my opinion, the single most important feature on display in my video is not that the alternative cells are rendered side-by-side, or that they can be executed and the software knows what that means in terms of executing their content; it is that the IPython notebook has become an authoring tool which analysts can use to easily create documents which describe what they actually did in a way that is reproducible, distributable, and most importantly, useful to the analyst.
> Suddenly they don't need to comment out or overwrite code when they take a different approach. Suddenly they can distribute a document that actually describes what they did, instead of just what they found. Suddenly referees don't need to wonder or ask about whether an alternative analysis method was investigated. Suddenly professors can show statistics students what statisticians actually do, instead of how final results are generated.
> You may read all this and think "that is great but way out of scope for what we want to achieve with IPython notebook". That is your right. And if the IPython team feels that way there is, of course, nothing I can do other than what I have done: explore these ideas on my own. I just want to be sure we're all talking about the same thing before that decision gets made.
What I would add is that as more people start using the IPython
notebook, I am fairly certain that others will want a way to represent
alternative options for executing cells and it would be good idea to
have an underlying architecture that can support this, and then think
about the UI issues going forward.
More information about the IPython-dev