[IPython-dev] Some new cell types for describing data analyses in IPy. Notebook

Wed Jul 3 21:04:54 EDT 2013

Matthias,

Thanks for your detailed response.

On Wed, Jul 3, 2013 at 1:25 AM, Matthias BUSSONNIER <
bussonniermatthias at gmail.com> wrote:

> Gabriel,
>
> You screen shot are interesting,
> At some point I played with gridster[1]
>
> and was more or less able to get cell to rearranges, but didn't keep the
> code.
> You might be interested.
>
> Keep in mind that the notebook browser-interface we ship is only one
> possible
> frontend that can interpret ipynb files, nothing prevent you to write a
> different frontend that display the notebook in a different format.
>
> This added to the fact that each cell can support arbitrary metadata, you
> should be able to arrange preexisting in structure that work together. It
> might
> be a little difficult to do it right now as our javascript is not yet
> modular
> enough to be easily reused, but we are moving toward it.
>

Respectfully, rolling my own frontend for ipynb files given all the work
the IPython team has done on the excellent notebook browser interface would
be an enormous and extremely wasteful duplication of effort. I don't think
its the right way to pursue these features.

Furthermore, if I were going to write an application offering the types of
features I am talking about from scratch, there wouldn't be any good reason
to base it on the unaltered ipynb format, as they don't easily support the
structure required by those features without the additional cell types I
implemented in my forked version.

> Right now I thing storing the notebook as a directed graph is problematic
> in a
> few way,
>

I'm not talking about storing the notebook as an actual directed graph data
structure. There would be benefits to that but its not necessary and it
isn't want I did in my forked version.

The ability to have nested cells (cells which contain other cells) gets us
everything we need structure wise, and is the basis of everything seen in
both the video (other than interactive code cell stuff) and screenshots I
posted. The ipynb file for the notebook pictured in the screenshot looks
exactly like a normal ipynb file except that in the json there are cell
declarations which have a cells field which contains the json descriptions
of the cells contained in that cell.

> the first being that it is incompatible with the fact that people want
> to be able to run notebook in a headless manner, which if you add explicit
> choice is not possible.
>

This isn't the case. The json saved versions of notebooks with branching
remember which version was most recently run. When an altset cell is
executed, it runs only the most recently run (or currently "selected",
though that means something else internally) branch. Thus by doing the
naive thing and looping through all top level cells and executing them, the
currently chosen path through the notebook can easily be run in a headless
environment and give the correct results.

> This also contradict the fact that the notebook capture
> both the input and the output of the computation.
>

I don't really understand what you mean by this. In the JSON representation
of an executed code cell, the input field contains the code, but not any
values of variables used by the code, nor any indication of code which was
run before executing the code cell.

Changing and rerunning an earlier code cell without re-executing the cell
in question can easily invalidate the output stored in the JSON, even
without the concept of branching or choice.

> As you showed there is
> actually 18 different combinations of data analysis, and they are not all
> stored in the notebook.
>

The notebook knows and records which choices were made. There are 18
different combinations of data analysis *but only one was chosen by analyst
as generating the final/most recent result*.

In the case of "publishing" about an analysis the notebook stores the path
most chosen by the analyst, while retaining information about what else he
or she did during the decision process.

In the case of instruction, imagine how much easier it would be to teach
data analysis if the students could actually see what data analysts do,
instead of simply the final method they choose in a particular analysis.

>
> I really thing this is an interesting project, and reusing only our
> metadata in
> the notebook, you should be able to  simulate it (store the dag in notebook
> level metadata, and cell id in cell metadata) then reconstruct the graph
> when
> needed. Keep in mind that at some point we might/will add cell id to the
> notebook.
>
> To sum up, I don't think the current JS client is in it's current state the
> place to implement such an idea. The Dag for cell order might be an idea
> for
> future notebook format but need to be well thought, and wait for cell IDs.
>

I apologize for not being clear. As I said in a response above, the
directed graph idea was intended to be conceptual for thinking about the
documents, not structural for actually storing them.

What I actually did was simply allow cell nesting and change indexing so
that it is with respect to the parent/container (cell or notebook) instead
of always with respect to the notebook. This required some machinery
changes but not too many and it is an extension in the mathematical sense
in that indexing will behave identically to the old system for notebooks
without any nesting while now meaningfully functioning for notebooks with
nesting.

~G

>
> --
> Matthias
>
>
>
> [1] http://gridster.net/
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>

-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20130703/9ce08d73/attachment.html>