[IPython-dev] Some new cell types for describing data analyses in IPy. Notebook

Brian Granger ellisonbg at gmail.com
Wed Jul 3 21:59:48 EDT 2013


Gabriel,

I watched your video and there are some nice ideas here.  We are not
headed in this direction in terms of *implementation* but I think you
will find that similar *capabilities* will show up in the notebook
over time.  A few comments about the implementation aspects:

First, the benefits of having a notebook be a linear sequence of cells
are massive:

* Simple, simple simple - this makes it very easy to reason about
notebooks in code.  Nesting leads to complexity that is not worth the
cost.
* You can get most of the benefits of nesting without the complexity.
As Min mentioned, there is an implied hirrarchy in the heading cells.
We plan on using that to allow group level actions - show/hide, run
group, move, cut/copy/paste, etc.
* It is not difficult to think about building a proper diff tool for
notebooks.  With nested cells this becomes horrific.
* Getting the UI/UX right for nested cells is a nightmare.  If you
have  ever used Mathematica, you will know this - beyond 1-2 levels,
the UI become unusable.
* Hierarchy puts a significant cognitive load on users

Because of these things we don't have any plans on changing the
notebook document format or notebook UI to allow nested cells.

Second, while it is tempting to generalize the notion of input to
include widgety things, it is more appropriate to put these things in
the output:

* Our output architecture has the notion of multiple representations.
This allows us to build rich widget as you have done, but to still
provide static representations (png, jpg, latex).
* Having the multiple representations of output allows us to build the
rich widgets, but maintain a clear path for converting notebooks to
static formats (pdf, html, word, powerpoint).
* Insisting that input cells are pure code allows you to reason in a
clear manner about how a notebook works = code runs and leads to
output.  That reasoning can be applied in an automate manner by
running notebooks in batch mode, or building a test system based on
them.
* Putting widgets in the input area forces you to do regular
expression matching to replace those variables in the code.  This
limits you to an extremely simple event model where the only possible
event you can know about is substitute the regular expression and run
all the code.  What if you want different UI controls in the browser
to trigger different bits of code in the kernels when different fine
grained events happen.  Making the UI controls live on the Python and
JS side allows us to build this in a natural way.

Th alt-cells you show bring up the issue of providence.  We have some
very initial thoughts about that, but it is way out of scope for the
project right now - we have a plates 10x overfull already.  We will
get there though eventually.

Thanks for sharing your ideas.

PS - for a bit more background about the context of our saying "no" to
this feature request, see this blog post:

http://brianegranger.com/?p=249

I also gave a talk about this at SciPy and will be posting my slides soon.

Cheers,

Brian

On Wed, Jul 3, 2013 at 6:04 PM, Gabriel Becker <gmbecker at ucdavis.edu> wrote:
> Matthias,
>
> Thanks for your detailed response.
>
>
> On Wed, Jul 3, 2013 at 1:25 AM, Matthias BUSSONNIER
> <bussonniermatthias at gmail.com> wrote:
>>
>> Gabriel,
>>
>> You screen shot are interesting,
>> At some point I played with gridster[1]
>>
>> and was more or less able to get cell to rearranges, but didn't keep the
>> code.
>> You might be interested.
>>
>> Keep in mind that the notebook browser-interface we ship is only one
>> possible
>> frontend that can interpret ipynb files, nothing prevent you to write a
>> different frontend that display the notebook in a different format.
>>
>> This added to the fact that each cell can support arbitrary metadata, you
>> should be able to arrange preexisting in structure that work together. It
>> might
>> be a little difficult to do it right now as our javascript is not yet
>> modular
>> enough to be easily reused, but we are moving toward it.
>
>
> Respectfully, rolling my own frontend for ipynb files given all the work the
> IPython team has done on the excellent notebook browser interface would be
> an enormous and extremely wasteful duplication of effort. I don't think its
> the right way to pursue these features.
>
> Furthermore, if I were going to write an application offering the types of
> features I am talking about from scratch, there wouldn't be any good reason
> to base it on the unaltered ipynb format, as they don't easily support the
> structure required by those features without the additional cell types I
> implemented in my forked version.
>
>>
>> Right now I thing storing the notebook as a directed graph is problematic
>> in a
>> few way,
>
>
> I'm not talking about storing the notebook as an actual directed graph data
> structure. There would be benefits to that but its not necessary and it
> isn't want I did in my forked version.
>
> The ability to have nested cells (cells which contain other cells) gets us
> everything we need structure wise, and is the basis of everything seen in
> both the video (other than interactive code cell stuff) and screenshots I
> posted. The ipynb file for the notebook pictured in the screenshot looks
> exactly like a normal ipynb file except that in the json there are cell
> declarations which have a cells field which contains the json descriptions
> of the cells contained in that cell.
>
>
>>
>> the first being that it is incompatible with the fact that people want
>> to be able to run notebook in a headless manner, which if you add explicit
>> choice is not possible.
>
>
> This isn't the case. The json saved versions of notebooks with branching
> remember which version was most recently run. When an altset cell is
> executed, it runs only the most recently run (or currently "selected",
> though that means something else internally) branch. Thus by doing the naive
> thing and looping through all top level cells and executing them, the
> currently chosen path through the notebook can easily be run in a headless
> environment and give the correct results.
>
>>
>> This also contradict the fact that the notebook capture
>> both the input and the output of the computation.
>
>
> I don't really understand what you mean by this. In the JSON representation
> of an executed code cell, the input field contains the code, but not any
> values of variables used by the code, nor any indication of code which was
> run before executing the code cell.
>
> Changing and rerunning an earlier code cell without re-executing the cell in
> question can easily invalidate the output stored in the JSON, even without
> the concept of branching or choice.
>
>
>>
>> As you showed there is
>> actually 18 different combinations of data analysis, and they are not all
>> stored in the notebook.
>
>
> The notebook knows and records which choices were made. There are 18
> different combinations of data analysis but only one was chosen by analyst
> as generating the final/most recent result.
>
> In the case of "publishing" about an analysis the notebook stores the path
> most chosen by the analyst, while retaining information about what else he
> or she did during the decision process.
>
> In the case of instruction, imagine how much easier it would be to teach
> data analysis if the students could actually see what data analysts do,
> instead of simply the final method they choose in a particular analysis.
>
>
>>
>>
>> I really thing this is an interesting project, and reusing only our
>> metadata in
>> the notebook, you should be able to  simulate it (store the dag in
>> notebook
>> level metadata, and cell id in cell metadata) then reconstruct the graph
>> when
>> needed. Keep in mind that at some point we might/will add cell id to the
>> notebook.
>>
>> To sum up, I don't think the current JS client is in it's current state
>> the
>> place to implement such an idea. The Dag for cell order might be an idea
>> for
>> future notebook format but need to be well thought, and wait for cell IDs.
>
>
> I apologize for not being clear. As I said in a response above, the directed
> graph idea was intended to be conceptual for thinking about the documents,
> not structural for actually storing them.
>
> What I actually did was simply allow cell nesting and change indexing so
> that it is with respect to the parent/container (cell or notebook) instead
> of always with respect to the notebook. This required some machinery changes
> but not too many and it is an extension in the mathematical sense in that
> indexing will behave identically to the old system for notebooks without any
> nesting while now meaningfully functioning for notebooks with nesting.
>
> ~G
>>
>>
>> --
>> Matthias
>>
>>
>>
>> [1] http://gridster.net/
>>
>> _______________________________________________
>> IPython-dev mailing list
>> IPython-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>
>
>
>
> --
> Gabriel Becker
> Graduate Student
> Statistics Department
> University of California, Davis
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>



-- 
Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu and ellisonbg at gmail.com



More information about the IPython-dev mailing list