[IPython-dev] magics and metadata

Tue Jun 19 20:01:10 EDT 2012

On Tue, Jun 19, 2012 at 4:20 PM, Brian Granger <ellisonbg at gmail.com> wrote:

> When the metadata PR come up, I was originally going to vote -1 on it
> because of this issue.  I sat on it for a while and in the end decided
> that it was OK because I think the need for metadata is already upon
> us even though we don't have an actual usage case in our own code base
> (for example, we don't have a metadata UI in the notebook web app).
>
> There is a fine line to walk here.  On one hand, I completely agree
> with you that we should try to future-proof the notebook format to
> minimize disruptive format changes.  On the other hand, adding things
> too soon leads to even more potential disruption for the following
> reason.  As I developed the notebook format and notebook UI last
> summer, there were multiple situations where I added something to the
> notebook format before I actually used it in the UI.  In many of these
> cases, when I did get around to developing the UI for it, I realized
> that my original thoughts on that element were incomplete.  It wasn't
> until I wrote the UI that used the data that I realized exactly what
> the format of that data needed to be.  As a result, I had to go back
> and modify the notebook format.  After a few iterations of this, I
> realized that this approach was broken and started to enforce the
> following simple rule on myself: don't add it to the notebook format
> until I am ready to write the UI code that uses it.  That rule served
> me very well last summer.
>
> This is why for example the notebook and cells do not currently have
> any timestamp information (even though I think we will eventually want
> it).  The one notebook feature (which I regret adding to the format)
> that doesn't have a UI is the multiple worksheets.  We absolutely want
> that as a feature, I just wish I had waited to add it to the notebook
> format.  When we do implement the mulitple worksheet UI, it is likely
> we will want to go back and make changes to the notebook format to
> better reflect the UI (for example, we will probably want to persist
> which worksheet is active/open).
>

I couldn't agree less.  There is simply no reason that adding support for
multiple worksheets in future versions of IPython should render
single-sheet notebooks unreadable in 0.13, just like adding new metadata
should not make the notebook artificially unreadable.

>
> For the cell and worksheet metadata, I knew we would eventually need
> it and I didn't want to hold up the beta release any longer.  But
> there are still unanswered questions related to it:
>
> * What types of things go in the metadata?
>
* Is this an area for us to write data to, or for advanced users to
> write data to?
> * Is it entirely unstructured, or will we require a discussion for
> each new key/value entry into it.

> It is not at all clean that the current metadata design will hold up
> to our answers of these questions.  But in the end, I sort of wanted
> to add the metadata as it is now, so we could being to see how we and
> others start to use it.  But just because we added the metadata to the
> notebook format definitely doesn't mean that future-proofs this part
> of the notebook format.
>

> Hope this clarifies things a bit.
>

Sure, while it is extremely clear that we need cell metadata, we cannot be
100% certain that
a simple dict will solve 100% of the cases we encounter.  But adding it now
means that we have at least a *chance*
of making a release that is not backwards-incompatible.

>
> Back to the question of output-level metadata.  When a bit of code
> remains unused for almost a year, I start to question whether we
> really need it.  I not convinced we don't need it, I am not sure.  In
> light of this, I don't think that adding it to the notebook format
> makes sense.  When one of us finds a good purpose for this metadata,
> let's add it to the nbformat them.
>

I believe the only current use is in the parallel display republishing,
where the engine ID is added to the display data
so that frontends could theoretically draw display data differently based
on which engine it came from.

>
> The other philosophical line of reasoning that I am being guided by
> here is simplicity.  It would be very easy to over design the notebook
> format and add all sorts of feature that we might need.  I think this
> is a wrong direction to go.  We want a notebook format that is as
> compact and minimal as possible, where each and every bit of data is
> there for a well-defined and justified reason.
>

I think it's simple: We have had ideas over and over and over again for
features requiring metadata attached to cells (hashes, links, timestamps,
etc.), so this is clearly a feature we have a need for right now.  It would
be totally silly for adding timestamps to require updating the nbformat in
a backward-incompatible way.  And the biggest advantage of using json is
that adding keys has no effect on backwards *readability*.  It's only
adding values/types that can cause problems, and should force new versions
(e.g. changing worsheet to worksheets, or adding new cell types).

-MinRK

> Cheers,
>
> Brian
>
>
>
> On Tue, Jun 19, 2012 at 3:25 PM, MinRK <benjaminrk at gmail.com> wrote:
> >
> >
> > On Tue, Jun 19, 2012 at 3:23 PM, Brian Granger <ellisonbg at gmail.com>
> wrote:
> >>
> >> On Tue, Jun 19, 2012 at 3:19 PM, MinRK <benjaminrk at gmail.com> wrote:
> >> >
> >> >
> >> > On Tue, Jun 19, 2012 at 3:18 PM, Brian Granger <ellisonbg at gmail.com>
> >> > wrote:
> >> >>
> >> >> On Tue, Jun 19, 2012 at 2:59 PM, Fernando Perez <
> fperez.net at gmail.com>
> >> >> wrote:
> >> >> > On Tue, Jun 19, 2012 at 1:17 PM, MinRK <benjaminrk at gmail.com>
> wrote:
> >> >> >> Yes - we put metadata on outputs for a reason, presumably.  If
> this
> >> >> >> shouldn't be saved, it should probably be removed from the API.
> >> >> >
> >> >> > I can't recall precisely what we had in mind when we put it in, but
> >> >> > something that springs to mind as potentially useful, for example,
> >> >> > would be to specify a desired priority order for the various types
> of
> >> >> > outputs. Right now when a client can display several kinds of
> output
> >> >> > it just makes a choice, but we could let objects provide a hint of
> >> >> > the
> >> >> > preferred order, based on what they know about the relative quality
> >> >> > of
> >> >> > each.
> >> >>
> >> >> I originally put it there to allow objects to provide hints to the
> >> >> frontend on how it should display a representation.  This is similar
> >> >> to how the payloads can indicate where it came from.
> >> >>
> >> >> > So I'd vote for not removing this, as it may prove useful...
> >> >>
> >> >> I also think it could be useful, although it seems a bit excessive to
> >> >> store metadata for each output.  Here is what I propose.  We simply
> >> >> leave it alone until we have an actual use case that will help us
> >> >> figure out exactly what this should look like.  Without a concrete
> >> >> usage case, it is difficult to know what is needed.
> >> >
> >> >
> >> > But this doesn't answer the immediate question: Should this metadata
> >> > dict be
> >> > included in the nbformat
> >>
> >> I would vote no - not until we have a real usage case.  I don't like
> >> to add things to the notebook format until we are actually using them.
> >
> >
> > Then should we remove all of the metadata stuff we just added?  The whole
> > point was to prepare the nbformat for future changes to we don't have to
> > update the nbformat, which is incredibly painful and should be done as
> > rarely as possible.
> >
> > -MinRK
> >
> >>
> >>
> >> >>
> >> >>
> >> >> > f
> >> >> > _______________________________________________
> >> >> > IPython-dev mailing list
> >> >> > IPython-dev at scipy.org
> >> >> > http://mail.scipy.org/mailman/listinfo/ipython-dev
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Brian E. Granger
> >> >> Cal Poly State University, San Luis Obispo
> >> >> bgranger at calpoly.edu and ellisonbg at gmail.com
> >> >> _______________________________________________
> >> >> IPython-dev mailing list
> >> >> IPython-dev at scipy.org
> >> >> http://mail.scipy.org/mailman/listinfo/ipython-dev
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > IPython-dev mailing list
> >> > IPython-dev at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/ipython-dev
> >> >
> >>
> >>
> >>
> >> --
> >> Brian E. Granger
> >> Cal Poly State University, San Luis Obispo
> >> bgranger at calpoly.edu and ellisonbg at gmail.com
> >> _______________________________________________
> >> IPython-dev mailing list
> >> IPython-dev at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/ipython-dev
> >
> >
> >
> > _______________________________________________
> > IPython-dev mailing list
> > IPython-dev at scipy.org
> > http://mail.scipy.org/mailman/listinfo/ipython-dev
> >
>
>
>
> --
> Brian E. Granger
> Cal Poly State University, San Luis Obispo
> bgranger at calpoly.edu and ellisonbg at gmail.com
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20120619/2adce331/attachment.html>