[IPython-dev] Re: questions about the notebook interface

Robert Kern rkern at ucsd.edu
Tue Jun 28 00:24:02 EDT 2005


John Hunter wrote:
>>>>>>"Robert" == Robert Kern <rkern at ucsd.edu> writes:
> 
>     Robert> And now that I've convinced you that all I'm off my
>     Robert> rocker, I'll also add that XML might actually be a
>     Robert> reasonable choice for the meta-format of the notebook
>     Robert> file. But since that choice assumes that you've agreed
> 
> What are the advantages of XML over python as a document format?
> Typically people point to extensibility, parsability and widespread
> utilization.  Python is clearly extensible and widely utilized; with
> access to the interpreter's internal AST representation you also have
> built-in access to the parse tree if you need it.  Th idea is that
> everyone writing python pretty much knows python and already has a
> decent python editor and so can readily edit the notebook in their
> favorite editor; the same is not true for python coders and an XML
> format.  If pure python can offer most or all of what XML does as a
> markup language, what do we gain by using XML?  "Say no to toy
> languages" and all that.  [ Truth in advertising: I've never used the
> python parser module so I can't really speak to how suitable it is in
> this context ]

Precisely. The AST has been tuned for working with code, not documents, 
and isn't very friendly even for that.

In [327]: compiler.parse("""from notebook.markup import rest
    .....: rest.title('This is a Python Notebook')
    .....: rest.heading(1,'A first-level heading')
    .....: def f(x, y):
    .....:     print y-x
    .....: """)
Out[327]: Module(None, Stmt([From('notebook.markup', [('rest', None)]), 
Discard(CallFunc(Getattr(Name('rest'), 'title'), [Const('This is a 
Python Notebook')], None, None)), Discard(CallFunc(Getattr(Name('rest'), 
'heading'), [Const(1), Const('A first-level heading')], None, None)), 
Function(None, 'f', ['x', 'y'], [], 0, None, 
Stmt([Printnl([Sub((Name('y'), Name('x')))], None)]))]))

There's compile.walk() to visit each node of the AST, but that's not a 
tool to process anything; it's a tool to build a tool. Fredrik Lundh's 
ElementTree really simplifies XML processing and very nearly makes XML 
Pythonic. In my experience it has raised XML from "Ew! Avoid at all 
costs!" to "It actually beats anything else I can come up with in as 
short a time." And with cElementTree, it's even going to be faster than 
anything else I'd write. ElementTree is really an in-memory 
representation of a document. It just happens to read from and write to 
disk as XML. If it makes you feel better, we can call it "ElementTree 
format" instead of XML.  :-)

A problem that you are going to come across is differentiating between 
Python code and notebook markup. On the one hand, you're going to have 
to deal with namespace collisions in the Python code. You're either 
going to have to enforce an arbitrary restriction on names, or you're 
going to have to choose unwieldly names for the notebook markup objects.

On the other hand, when people are writing Python, they are going to 
code Python. With Python markup, people are going to do things like 
string interpolation and computed values because they can. And while 
that might be a nice ability to have (although I'd have to be convinced 
why it's necessary above and beyond the recording of Out[NN]'s or 
explicit outputs like the %magics I mentioned), it eliminates the 
ability to process the notebook via passive parsing, which I think is 
crucial. You'd have to execute it.

I consider using a plain-old-editor to be a temporary transition stage 
in the project and a quick-and-dirty-need-to-change-something-quickly 
even when the project matures. I think that people have enough 
experience with HTML in this day and age to do these temporary/simple 
tasks. I don't think that we should cater to this stage of development 
at the expense of other qualities that should be in the final product. 
But I may be underestimating the significance of the plain-old-editor 
method. That said, I think the other technical issues are dispositive.

Python markup doesn't quite solve the experience problem. People know 
Python, but the notebook markup isn't actually Python. It's a subset of 
Python syntax being interleaved with real Python in semantically strange 
ways. Once they get past that, they still need to learn the details of 
the markup pseudo-API. Given that they're forgoing the shiny GUI for a 
plain-old-editor, I think it's likely they know enough HTML to do what 
they need to do inside an XML file.

I think that a set of well-crafted example files would go a long way 
towards enabling people to hand-edit the notebook files for their needs.

I don't think you can "Say no to toy languages" here. You need to build 
a toy language regardless. I think the tools for building toy languages 
heavily favor XML and not Python.

Oh yeah, links.

> That is one of my arguments in favor of the "python-as-format" option.
> There are a couple of others, but lest I embarrass myself by
> disagreeing with Robert, I think it best if I collect my thoughts for
> a while.

Don't mistake my long-windedness for knowledgability. I'm just throwing 
stuff out here.  :-)

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter




More information about the IPython-dev mailing list