[IPython-dev] [FWD] An interesting take on the Notebook Problem

Fri Jul 22 10:05:18 EDT 2005

Fernando Perez wrote:

> [Chris, note that ipytho-dev discards non-subscriber posts (too much 
> spam). I've manually

thanks for posting this - was quite a good read, and something that 
makes the notebook project even more interesting

>> takes, although not precisely focused on the notion of an interactive
>> notebook, might still be an interesting read to anyone involved in
>> (..)
>> Robert Gentleman (2005) "Reproducible Research: A Bioinformatics Case
>> Study", Statistical Applications in Genetics and Molecular Biology:
>> Vol. 4: No. 1, Article 2.
>> http://www.bepress.com/sagmb/vol4/iss1/art2
>
in fact, the question i was left pondering, regarding the work on python 
notebooks and the discussions about literate programming vs. the 'REPL' 
(Read-Eval-Print-Loop, i.e. ipython) modes of working, was how the 
interactive use fits the picture there.

in that paper, they define the so-called Compendiums 'navigable 
documents', referring to the reader being able to 'explore and 
reproduce' the content. is such exploration of e.g. a dataset or a model 
a case of 'REPL', when e.g. changing variables or perhaps defining 
functions to explore te data/model in different cases? i would think so. 
that way such compendiums and ipython would be a perfect match, 
something that literate programming (which if of course one of the 
underlying basic ideas, as Gentleman also mentions) perhaps does not 
facilitate in itself? (although checking out Leo etc. is still in my 
to-do list so can't really tell).

installed R the other day to get a feel of that environment, seemed like 
a rich world and the interactive use was pretty nice (though i still 
think it is & will be better with Python), but unfortunately could not 
get the GolubRR package to work (yet). seems to be some version issue: 
perhaps that GolubRR is packaged for R < 2.0, while the current (in e.g. 
Debian) is > 2.0. a question about it had been posted to the 
Bioconductor mailing list, but i could not find any answer (nor haven't 
gotten one from the poster of that mail) - 
https://stat.ethz.ch/pipermail/bioconductor/2004-November/006906.html .

Besides the SciPy tutorial, perhaps that seminal paper Golub and the R 
compendium should also be among the 'test cases' for Python notebooks 
(hopefully the dataset is not too tricky to convert to whatever the 
format in our system would be).

oh and one technical issue relating reproducibility (although that is 
probably not among the top questions for us yet): as Gentleman notes, 
some research methods use randomness, and to be able to reproduce the 
exact same computations, and a heavy way is to include the random table 
used in the compendium. quick Googling now didn't tell me if the Python 
random generator is guaranteed to give the same results with the same 
seed and in what conditions (i know from working on procedural modelling 
that it at least works on the same computer..), but that can be looked 
at later.

ah and one simple issue: how in our format do we include values of 
variables in a text? Gentleman shows on p. 11 how a result from a 
computation is included in a  sentence using the Sexpr command in their 
system. in Python it would of course be, supposing that 'genes' would be 
the list of selected genes, simply:

".. filtering process selected %d genes" % len(genes)
OR
"selected " + len(genes) + " genes"
OR
"selected", len(genes), "genes" -- if we somehow support the form that 
the print statement handles

, but how in the XML? (i guess it's trivial but just don't know yet).

there may be other issues in the paper that would be important to note, 
but those are what i was left thinking.

~Toni