[IPython-dev] magics and metadata

Wed Jun 20 18:09:35 EDT 2012

On Jun 20, 2012, at 11:06 AM, Brian Granger <ellisonbg at gmail.com> wrote:

> On Tue, Jun 19, 2012 at 7:49 PM, MinRK <benjaminrk at gmail.com> wrote:
>>
>>
>> On Tue, Jun 19, 2012 at 7:25 PM, Brian Granger <ellisonbg at gmail.com> wrote:
>>>
>>> On Tue, Jun 19, 2012 at 5:01 PM, MinRK <benjaminrk at gmail.com> wrote:
>>>>
>>>>
>>>> On Tue, Jun 19, 2012 at 4:20 PM, Brian Granger <ellisonbg at gmail.com>
>>>> wrote:
>>>>>
>>>>> When the metadata PR come up, I was originally going to vote -1 on it
>>>>> because of this issue.  I sat on it for a while and in the end decided
>>>>> that it was OK because I think the need for metadata is already upon
>>>>> us even though we don't have an actual usage case in our own code base
>>>>> (for example, we don't have a metadata UI in the notebook web app).
>>>>>
>>>>> There is a fine line to walk here.  On one hand, I completely agree
>>>>> with you that we should try to future-proof the notebook format to
>>>>> minimize disruptive format changes.  On the other hand, adding things
>>>>> too soon leads to even more potential disruption for the following
>>>>> reason.  As I developed the notebook format and notebook UI last
>>>>> summer, there were multiple situations where I added something to the
>>>>> notebook format before I actually used it in the UI.  In many of these
>>>>> cases, when I did get around to developing the UI for it, I realized
>>>>> that my original thoughts on that element were incomplete.  It wasn't
>>>>> until I wrote the UI that used the data that I realized exactly what
>>>>> the format of that data needed to be.  As a result, I had to go back
>>>>> and modify the notebook format.  After a few iterations of this, I
>>>>> realized that this approach was broken and started to enforce the
>>>>> following simple rule on myself: don't add it to the notebook format
>>>>> until I am ready to write the UI code that uses it.  That rule served
>>>>> me very well last summer.
>>>>>
>>>>> This is why for example the notebook and cells do not currently have
>>>>> any timestamp information (even though I think we will eventually want
>>>>> it).  The one notebook feature (which I regret adding to the format)
>>>>> that doesn't have a UI is the multiple worksheets.  We absolutely want
>>>>> that as a feature, I just wish I had waited to add it to the notebook
>>>>> format.  When we do implement the mulitple worksheet UI, it is likely
>>>>> we will want to go back and make changes to the notebook format to
>>>>> better reflect the UI (for example, we will probably want to persist
>>>>> which worksheet is active/open).
>>>>
>>>>
>>>> I couldn't agree less.  There is simply no reason that adding support
>>>> for
>>>> multiple worksheets in future versions of IPython should render
>>>> single-sheet
>>>> notebooks unreadable in 0.13, just like adding new metadata should not
>>>> make
>>>> the notebook artificially unreadable.
>>>
>>> I am not sure I am following you on this.  Are you suggesting that
>>> 0.14 notebooks (let's say we bump to a v4 nbformat with expanded
>>> worksheet support) should be readable in 0.13?
>>
>>
>> I think I am saying the opposite - with the current state of 0.13, adding
>> multi-worksheet support to the *javascript* should not result in
>> incrementing the notebook version.
>
> With the current state of the notebook format, I think we can probably
> pull this off.  So far, the only changes to the notebook format I can
> imagine will be minor version incrementing ones.
>
>>>
>>>
>>>>>
>>>>>
>>>>> For the cell and worksheet metadata, I knew we would eventually need
>>>>> it and I didn't want to hold up the beta release any longer.  But
>>>>> there are still unanswered questions related to it:
>>>>>
>>>>> * What types of things go in the metadata?
>>>>>
>>>>> * Is this an area for us to write data to, or for advanced users to
>>>>> write data to?
>>>>> * Is it entirely unstructured, or will we require a discussion for
>>>>> each new key/value entry into it.
>>>>>
>>>>>
>>>>> It is not at all clean that the current metadata design will hold up
>>>>> to our answers of these questions.  But in the end, I sort of wanted
>>>>> to add the metadata as it is now, so we could being to see how we and
>>>>> others start to use it.  But just because we added the metadata to the
>>>>> notebook format definitely doesn't mean that future-proofs this part
>>>>> of the notebook format.
>>>>>
>>>>>
>>>>> Hope this clarifies things a bit.
>>>>
>>>>
>>>> Sure, while it is extremely clear that we need cell metadata, we cannot
>>>> be
>>>> 100% certain that
>>>> a simple dict will solve 100% of the cases we encounter.  But adding it
>>>> now
>>>> means that we have at least a *chance*
>>>> of making a release that is not backwards-incompatible.
>>>
>>> Yes, I agree with this.
>>>
>>>>>
>>>>>
>>>>> Back to the question of output-level metadata.  When a bit of code
>>>>> remains unused for almost a year, I start to question whether we
>>>>> really need it.  I not convinced we don't need it, I am not sure.  In
>>>>> light of this, I don't think that adding it to the notebook format
>>>>> makes sense.  When one of us finds a good purpose for this metadata,
>>>>> let's add it to the nbformat them.
>>>>
>>>>
>>>> I believe the only current use is in the parallel display republishing,
>>>> where the engine ID is added to the display data
>>>> so that frontends could theoretically draw display data differently
>>>> based on
>>>> which engine it came from.
>>>
>>> Yes, we have discussed this.  The only other situation where I
>>> remember thinking about this is if we wanted to use metadata to help a
>>> frontend interpret JSON display data.  There are numerous reasons code
>>> might display JSON data, and that code would have to help the frontend
>>> to know what to do with that data.
>>>
>>> Do you think the engine ID idea makes sense to implement or should
>>> that information just be passed in the formatted display data itself?
>>> We could also handle by creating a custom JS widget that knows how to
>>> intelligently display data from multiple engines.
>>
>>
>> Right now I do both since the metadata is totally ignored, but I think it's
>> better to have less markup in the output itself.  It is precisely the same
>> reason we don't embed the rendered prompt in the output of execute replies -
>> frontends have their own way of rendering them (in the prompt column, etc.).
>>  The metadata could be used to do that for parallel results, rather than the
>> current behavior of having fakee prompts in the general output area.
>
> OK if you think we want to go this route for displaying the engine
> IDs, then we should i) keep the display data metadata in the message
> itself and ii) move towards persisting that information in the
> nbformat.
>
>>>
>>>>>
>>>>>
>>>>> The other philosophical line of reasoning that I am being guided by
>>>>> here is simplicity.  It would be very easy to over design the notebook
>>>>> format and add all sorts of feature that we might need.  I think this
>>>>> is a wrong direction to go.  We want a notebook format that is as
>>>>> compact and minimal as possible, where each and every bit of data is
>>>>> there for a well-defined and justified reason.
>>>>
>>>>
>>>> I think it's simple: We have had ideas over and over and over again for
>>>> features requiring metadata attached to cells (hashes, links,
>>>> timestamps,
>>>> etc.), so this is clearly a feature we have a need for right now.
>>>
>>> Yes - maybe I wasn't completely clear.  I do think that having cell
>>> and worksheet metadata right now does make sense.
>>>
>>>>  It would
>>>> be totally silly for adding timestamps to require updating the nbformat
>>>> in a
>>>> backward-incompatible way.
>>>
>>> And I am definitely not suggesting that it would or should.
>>>
>>>>  And the biggest advantage of using json is that
>>>> adding keys has no effect on backwards *readability*.  It's only adding
>>>> values/types that can cause problems, and should force new versions
>>>> (e.g.
>>>> changing worsheet to worksheets, or adding new cell types).
>>>
>>> Yes, JSON indeed turned out to be much nicer than XML for this type of
>>> thing exactly because of this.
>>>
>>> But I am wondering what your thought are about newer notebook versions
>>> being readable by older IPython versions.  I have always thought that
>>> we would promise that older nbformats would *always* be readable by
>>> newer IPython versions, but that we would make no promises about newer
>>> nformats being readable by older IPython versions.  I just want to
>>> clarify what other people are thinking in this respect.
>>
>>
>> Incrementing the nbformat means making notebooks unreadable in old versions,
>> yes.
>> This is very painful if we are doing it every six months.  I am only trying
>> to make
>> reasonable efforts that the current nbformat is prepared for changes we
>> *know* we intend to make soon,
>> so that incrementing the nbformat is reserved for changes we don't already
>> have planned, and aren't
>> already prepared for.
>> Obviously, if we have a change that we cannot fit into the current format,
>> then we increment.
>
> I honestly can't think of any upcoming changes to the notebook format
> that we have thought about which would require a major version
> increment like you are talking about.  I think there are lots of minor
> ones that we can do using minor version increments.  I like the minor
> versioning scheme we have now as it clarifies our policies on this.
> So I think overall, the notebook format is pretty future safe for the
> time being.  I hope we can stick with the 3.x nbformats for a few
> IPython releases.

I'm curious what the effective difference between a minor version and
a major version would be to me, the user. Would you try to make minor
versions backward compatible if possible, either by not putting in new
keys if they don't need to be there or by somehow trying to future
proof the notebook to new unexpected notebook format changes?

Because as far as I, the user, am concerned, if a newer notebook
format version doesn't work at all in older versions of IPython (such
as is the case with notebook format v3 and IPython 0.12), then it
hardly matters how "major" or "minor" the changes were. Or maybe you
are thinking more for the benefit of people like Sage who are building
on top of the notebook API?

By the way, I completely agree with Brian that future proofing is
usually a waste of time. But also be careful against overly "past
proofing". I would much rather see new features added to the notebook,
even every release, than to have them held back simply for the
purposes of keeping things backwards compatible. Also, if jumping the
gun on future proofing is a waste of time, so is spending a lot of
effort on making sure that new notebook versions work correctly in
older, unsupported releases.

Aaron Meurer

>
>> But where we are right now, adding to the metadata on cells or adding
>> multiple worksheets will *not* require
>> bumping the nbformat.
>
> Right.
>
> Cheers,
>
> Brian
>
>>>
>>>
>>> Cheers,
>>>
>>> Brian
>>>
>>>> -MinRK
>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Brian
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 19, 2012 at 3:25 PM, MinRK <benjaminrk at gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 19, 2012 at 3:23 PM, Brian Granger <ellisonbg at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Tue, Jun 19, 2012 at 3:19 PM, MinRK <benjaminrk at gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 19, 2012 at 3:18 PM, Brian Granger
>>>>>>>> <ellisonbg at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Jun 19, 2012 at 2:59 PM, Fernando Perez
>>>>>>>>> <fperez.net at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> On Tue, Jun 19, 2012 at 1:17 PM, MinRK <benjaminrk at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> Yes - we put metadata on outputs for a reason, presumably.  If
>>>>>>>>>>> this
>>>>>>>>>>> shouldn't be saved, it should probably be removed from the
>>>>>>>>>>> API.
>>>>>>>>>>
>>>>>>>>>> I can't recall precisely what we had in mind when we put it in,
>>>>>>>>>> but
>>>>>>>>>> something that springs to mind as potentially useful, for
>>>>>>>>>> example,
>>>>>>>>>> would be to specify a desired priority order for the various
>>>>>>>>>> types
>>>>>>>>>> of
>>>>>>>>>> outputs. Right now when a client can display several kinds of
>>>>>>>>>> output
>>>>>>>>>> it just makes a choice, but we could let objects provide a hint
>>>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>> preferred order, based on what they know about the relative
>>>>>>>>>> quality
>>>>>>>>>> of
>>>>>>>>>> each.
>>>>>>>>>
>>>>>>>>> I originally put it there to allow objects to provide hints to
>>>>>>>>> the
>>>>>>>>> frontend on how it should display a representation.  This is
>>>>>>>>> similar
>>>>>>>>> to how the payloads can indicate where it came from.
>>>>>>>>>
>>>>>>>>>> So I'd vote for not removing this, as it may prove useful...
>>>>>>>>>
>>>>>>>>> I also think it could be useful, although it seems a bit
>>>>>>>>> excessive
>>>>>>>>> to
>>>>>>>>> store metadata for each output.  Here is what I propose.  We
>>>>>>>>> simply
>>>>>>>>> leave it alone until we have an actual use case that will help us
>>>>>>>>> figure out exactly what this should look like.  Without a
>>>>>>>>> concrete
>>>>>>>>> usage case, it is difficult to know what is needed.
>>>>>>>>
>>>>>>>>
>>>>>>>> But this doesn't answer the immediate question: Should this
>>>>>>>> metadata
>>>>>>>> dict be
>>>>>>>> included in the nbformat
>>>>>>>
>>>>>>> I would vote no - not until we have a real usage case.  I don't like
>>>>>>> to add things to the notebook format until we are actually using
>>>>>>> them.
>>>>>>
>>>>>>
>>>>>> Then should we remove all of the metadata stuff we just added?  The
>>>>>> whole
>>>>>> point was to prepare the nbformat for future changes to we don't have
>>>>>> to
>>>>>> update the nbformat, which is incredibly painful and should be done
>>>>>> as
>>>>>> rarely as possible.
>>>>>>
>>>>>> -MinRK
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> f
>>>>>>>>>> _______________________________________________
>>>>>>>>>> IPython-dev mailing list
>>>>>>>>>> IPython-dev at scipy.org
>>>>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Brian E. Granger
>>>>>>>>> Cal Poly State University, San Luis Obispo
>>>>>>>>> bgranger at calpoly.edu and ellisonbg at gmail.com
>>>>>>>>> _______________________________________________
>>>>>>>>> IPython-dev mailing list
>>>>>>>>> IPython-dev at scipy.org
>>>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> IPython-dev mailing list
>>>>>>>> IPython-dev at scipy.org
>>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Brian E. Granger
>>>>>>> Cal Poly State University, San Luis Obispo
>>>>>>> bgranger at calpoly.edu and ellisonbg at gmail.com
>>>>>>> _______________________________________________
>>>>>>> IPython-dev mailing list
>>>>>>> IPython-dev at scipy.org
>>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> IPython-dev mailing list
>>>>>> IPython-dev at scipy.org
>>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Brian E. Granger
>>>>> Cal Poly State University, San Luis Obispo
>>>>> bgranger at calpoly.edu and ellisonbg at gmail.com
>>>>> _______________________________________________
>>>>> IPython-dev mailing list
>>>>> IPython-dev at scipy.org
>>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> IPython-dev mailing list
>>>> IPython-dev at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>>
>>>
>>>
>>>
>>> --
>>> Brian E. Granger
>>> Cal Poly State University, San Luis Obispo
>>> bgranger at calpoly.edu and ellisonbg at gmail.com
>>> _______________________________________________
>>> IPython-dev mailing list
>>> IPython-dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>
>>
>>
>> _______________________________________________
>> IPython-dev mailing list
>> IPython-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>
>
>
>
> --
> Brian E. Granger
> Cal Poly State University, San Luis Obispo
> bgranger at calpoly.edu and ellisonbg at gmail.com
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev