[IPython-dev] Some new cell types for describing data analyses in IPy. Notebook

Thu Jul 4 05:13:51 EDT 2013

Hi all, Many responses in one Mail

Le 3 juil. 2013 à 11:48, Russell Neches a écrit :

> In response to Matthias' points...
> 
>> Right now I thing storing the notebook as a directed graph is
>> problematic in a few way, the first being that it is incompatible with
>> the fact that people want to be able to run notebook in a headless
>> manner, which if you add explicit choice is not possible. 
> 
> If by "headless" execution you mean converting the notebook into a
> regular .py script, I think the directed graph model isn't a problem. In
> fact, it actually elegantly solves a number of thorny problems involved
> with transforming a notebook into a script.

Not exactly, I mean stripping cells from their output, and regenerating the output
without an open browser. But if you say the notebook keep references of choice
then this is not really a problem..

> …

> Gabe is proposing to allow (but not force!) the user to build non-linear
> code paths into their notebooks. This way, it is possible to SPECIFY a
> PARTICULAR path through the code cells, and then output it in a linear
> form. 
> 
> The example he offered is a wonderful illustration of why this is
> awesome. There are eighteen possible .py files (or PDFs, or HTML files,
> etc etc) one could generate, and the user simply needs to choose one. If
> the user wants to proceed to make another script representing a
> different path, they can just activate one of the alternative paths and
> output another script. This is extremely powerful. The user could
> perhaps elect to generate all possible linear scripts for a given
> collection of alternatives, and then dispatch them to different nodes to
> compare and contrast the results. 
> 

I understand what you want to do, but first IMHO this is too complicated.
Users are already lost by the fact that codecell can be executed out of order. 
I also think this is close to a discussion that we had with the %with magic, 
in the sense that most of the use case could be solved by using feature of 
the underlying programming language. In this some case probably an if statement. 

> There is even a sensible default behavior; simply linearize according to
> the current selection state of the task cell. The ability to swap around
> chunks of a workflow and then output linearized scripts makes the
> concept of "headless" execution vastly more powerful and interesting. If
> you've ever had to work with things like Sun Grid Engine (a lot of us
> scientists are pretty much stuck with it), this would be a Life Changing
> Ability.

> There is also the matter of incompatible assumptions. I often create
> notebooks that begin with a bunch of code cells, each of which loads a
> different data set and sets parameters related to that particular data
> set. When I use these notebooks, I execute ONE of these, skip the rest,
> and then proceed to the actual analysis. At the moment there is NO WAY
> to correctly output this kind of notebook as a script without modifying
> it. 

I get that, and I myself would really like to execute notebook in a headless
manner that generate a report based on input data. It still have to be done, 
even if it is not that hard it have to be designed.

It does not take more than 50 line to do the basic[1], you just have to start the
kernel yourself before evaluating the notebook, and execute the data loading 
before evaluating each cell of the notebook, you can even inject dynamically
 the codecell you would have like to run, and save the final notebook.

> 
> The directed graph model makes this problem simply go away. I can stick
> these incompatible alternatives into a task, and just pick one of them. 
> 
> Again, nothing forces me to use these features. As Gabe pointed out, a
> linear document is a subset of a directed graph. It should be possible
> to load old notebooks as (rather dull) directed graphs without making a
> giant mess of the JavaScript.

From a theoretical point of view I agree, nonetheless, inserting, searching and other
common operation would rapidly become rather difficult, and even if the cost would
be low here, this would mean that any software that would like to work with ipynb
should support directed graph.

> 
>> This also contradict the fact that the notebook capture both the input
>> and the output of the computation. As you showed there is actually 18
>> different combinations of data analysis, and they are not all stored
>> in the notebook. 
> 
> I haven't dug into Gabe's code, but this doesn't seem to be a problem. A
> task cell has ONE input, ONE output, and at any given time, ONE selected
> execution pathway. From the outside, it works just like a regular code
> cell. It's just got some private state information about which execution
> pathway is currently active.
> 
>> To sum up, I don't think the current JS client is in it's current
>> state the place to implement such an idea. The Dag for cell order
>> might be an idea for future notebook format but need to be well
>> thought, and wait for cell IDs.
>> 
> You mean the JS client was PREVIOUSLY not in a sate to implement such an
> idea, and so Gabe fixed it. Hooray! ;-)
> 
> 
> Russell

Le 4 juil. 2013 à 03:04, Gabriel Becker a écrit :

> This added to the fact that each cell can support arbitrary metadata, you
> should be able to arrange preexisting in structure that work together. It might
> be a little difficult to do it right now as our javascript is not yet modular
> enough to be easily reused, but we are moving toward it.
> 
> Respectfully, rolling my own frontend for ipynb files given all the work the IPython team
> has done on the excellent notebook browser interface would be an enormous and extremely
> wasteful duplication of effort. I don't think its the right way to pursue these features.

With current architecture, I agree, but in the end you should be able to include only one javascript file
and the rest should be pulled with require.js so you would just need to overwrite what you need. 
In a perfect world the notebook would just be a jslib you can use, so you wouldn't have to patch what we do
but pass a list of (exrta) cell type you want to support, and maybe custom read methods for the cell the core
don't know about. Not sure how far we would support that, but it should be pretty easy to make custom format On Top
of ipynb

> Furthermore, if I were going to write an application offering the types of features I am talking about from scratch,
> there wouldn't be any good reason to base it on the unaltered ipynb format, as they don't easily support the structure
> required by those features without the additional cell types I implemented in my forked version. 
> 
> 
> Right now I thing storing the notebook as a directed graph is problematic in a
> few way,
> 
> I'm not talking about storing the notebook as an actual directed graph data structure. There would be benefits
> to that but its not necessary and it isn't want I did in my forked version.
> 
> The ability to have nested cells (cells which contain other cells) gets us everything we need structure wise,
> and is the basis of everything seen in both the video (other than interactive code cell stuff) and screenshots
> I posted. The ipynb file for the notebook pictured in the screenshot looks exactly like a normal ipynb file
> except that in the json there are cell declarations which have a cells field which contains the json descriptions
> of the cells contained in that cell.

this is more or less what I called storing as a DAG (or more like a tree I guess here), this look a lot like what 
we had with worksheet, and we are moving away from this data structure because of it's complexity to handle some cases

> …
> 
> I apologize for not being clear. As I said in a response above, the directed graph idea was intended to be conceptual
> for thinking about the documents, not structural for actually storing them.

I don't think the 2 are unrelated. thinking and storing document as graph could make sens.

> What I actually did was simply allow cell nesting and change indexing so that it is with respect to the parent/container
> (cell or notebook) instead of always with respect to the notebook. This required some machinery changes but not too
> many and it is an extension in the mathematical sense in that indexing will behave identically to the old system for
> notebooks without any nesting while now meaningfully functioning for notebooks with nesting.

I'm still curious of that, and would be a little afraid of how you handle things in UI.

Le 4 juil. 2013 à 03:59, Brian Granger a écrit :

> Gabriel,
> ...
> 
> Second, while it is tempting to generalize the notion of input to
> include widgety things, it is more appropriate to put these things in
> the output:

> * Putting widgets in the input area forces you to do regular
> expression matching to replace those variables in the code.  This
> limits you to an extremely simple event model where the only possible
> event you can know about is substitute the regular expression and run
> all the code.  What if you want different UI controls in the browser
> to trigger different bits of code in the kernels when different fine
> grained events happen.  Making the UI controls live on the Python and
> JS side allows us to build this in a natural way.

This is one place where I sometime disagree with Brian, where I think
input widget for codemirror would be great. To compare with Gabriel 'interactive'
code cell, I would be more inclined to provide the ability to bind with get to Codemirror
like in http://livecoding.io/3419309
through reg-exp it bind to any variable in the codecell and pop a widget to change the value.
you don't have to explicitly state which variable should be bound. 
Implementation detail CM provide a method to get token at cursor, which helps a lot…
It has also the advantage of working without changing the cell type.

> Th alt-cells you show bring up the issue of providence.  We have some
> very initial thoughts about that, but it is way out of scope for the
> project right now - we have a plates 10x overfull already.  We will
> get there though eventually.

Personally I'm torn with alt cell. I feel they should be used using function and cases with the underlying language, 
but they have a definitive advantage in teaching and exploring.

> Thanks for sharing your ideas.

Final thought, 

	Seeing interactive widget is definitively awesome, and look fantastic.
I think we can avoid having a specific cell type for that altogether,
and IMHO input methods (like livecoding.io) and interactive widget are 
2 complementary approaches. I think Gabriel way of adding specific widget
that are bound to specific line could also be done without specific cell type 
using metadata. I can't wait to be able to select my matplotlib color in a color picker
directly in CM for example.

	The task cell are nice, but think should be covered using a different mechanism.
as brian pointed out we think of using implicit grouping using headers.

	As I already said, I'm torn about alt-cell. The UI is nice for teaching, but I think it covers a
too small use case. in particular to change the path you use, you need a user interaction, 
or to use a specific tool to run the notebook by selecting a path. 
This IMHO go agains the modularity. I see that those weeks because I'm polishing a notebook
for a publication, alt cell would have definitively been usefull for test, but now I'm curing myself for not having
written function I could have reused and add if statement to select the case.
	So I would be inclined to have a semi-linear approaches she you write the functions you need, 
and the path is selected using cases in python, why not with a radio-selector ui that set the value of a variable
which set the future path, but not actually selecting a cell at ipynb level. This has also the advantage of being 
pure python compatible, without having to generate multiple script.

	Still really impressed by the work, and I really thing this is a good start for more discussion, and a nice starting point
to design stuff we will add later.

Sorry for the length and if I missed stuff.
-- 
Matthias

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20130704/60af6d2d/attachment.html>