[IPython-dev] Extensible pretty-printing

Thu Oct 28 21:17:55 EDT 2010

On Thu, Oct 28, 2010 at 5:13 PM, Robert Kern <robert.kern at gmail.com> wrote:

>> OK, so how do you want to proceed: do you want to reopen your pull
>> request (possibly rebasing it if necessary) as it was, or do you want
>> to go ahead and implement the above approach right away?
>
> I'd rather implement this approach right away. We just need to decide what the
> keys should be and what they should mean. I originally used the ID of the
> DisplayFormatter. This would allow both a "normal" representation and an
> enhanced one both of the same type (plain text, HTML, PNG image) to coexist.
> Then the frontend could pick which one to display and let the user flip back and
> forth as desired even for old Out[] entries without reexecuting code. This may
> be a case of YAGNI.

Actually I don't think it's YAGNI, and I have a specific use case in
mind, with a practical example.  Lyx shows displayed equations, but if
you copy one, it's nice enough to actually feed the clipboard with the
raw Latex for the equation.  This is very convenient, and I often use
it to edit complex formulas in lyx that I then paste into reST docs.

We could similarly have pretty display of e.g. sympy output, but where
one could copy the raw latex fort the output cell.  The ui could
expose this via a context menu that offers 'copy image, copy latex,
copy string' for example.

So this does strike me like genuinely useful and valuable functionality.

> However, that means that the frontend needs to know about the IDs of the
> DisplayFormatters. It needs to know that 'my-tweaked-html' formatter is HTML. I
> might propose this as the fully-general solution:
>
> Each DisplayFormatter has a unique ID and a non-unique type. The type string
> determines how a frontend would actually interpret the data for display. If a
> frontend can display a particular type, it can display it for any
> DisplayFormatter of that type. There will be a few predefined type strings with
> meanings, but implementors can define new ones as long as they pick new names.
>
>   text -- monospaced plain text (unicode)
>   html -- snippet of HTML (anything one can slap inside of a <div>)
>   image -- bytes of an image file (anything loadable by PIL, so no need to have
> different PNG and JPEG type strings)
>   mathtext -- just the TeX-lite text (the frontend can render it itself)
>
> When given an object for display, the DisplayHook will give it to each of the
> DisplayFormatters in turn. If the formatter can handle the object, it will
> return some JSONable object[1]. The DisplayHook will append a 3-tuple
>
>   (formatter.id, formatter.type, data)
>
> to a list. The DisplayHook will give this to whatever is forming the response
> message.
>
> Most likely, there won't be too many of these formatters for the same type
> active at any time and there should always be the (id='default', type='text')
> formatter. A simple frontend can just look for that. A more complicated GUI
> frontend may prefer a type='html' response and only fall back to a type='text'
> format. It may have an ordered list of formatter IDs that it will try to display
> before falling back in order. It might allow the user to flip through the
> different representations for each cell. For example, if I have a
> type='mathtext' formatter showing sympy expressions, I might wish to go back to
> a simple repr so I know what to type to reproduce the expression.
>
> I'm certain this is overengineered, but I think we have use cases for all of the
> features in it. I think most of the complexity is optional. The basic in-process
> terminal frontend doesn't even bother with most of this and just uses the
> default formatter to get the text and prints it.
>
> [1] Why a general JSONable object instead of just bytes? It would be nice to be
> able to define a formatter that could give some structured information about the
> object. For example, we could define an ArrayMetadataFormatter that gives a dict
> with shape, dtype, etc. A GUI frontend could display this information nicely
> formatted along with one of the other representations.

Most of this I agree with.  Just one question: why not use real mime
types for the type info?  I keep thinking that for our payloads and
perhaps also for this, we might as well encode type metadata as
mimetypes: they're reasonably standardized, python has a mime library,
and browsers are wired to do something sensible with mime-tagged data
already.  Am I missing something?

>> If the latter, I'm not sure I like the approach of passing a dict
>> through and letting each formatter modify it.  Sate that mutates
>> as-it-goes tends to produce harder to understand code, at least in my
>> experience.  Instead, we can call all the formatters in sequence and
>> get from each a pair of key, value.  We can then insert the keys into
>> a dict as they come on our side (so if the storage structure ever
>> changes from a dict to anything else, likely the formatters can stay
>> unmodified).  Does that sound reasonable to you?
>
> That's actually how I would have implemented it [my original ipwx code
> notwithstanding ;-)].

OK.  It seems we're converging design wise to the point where code can
continue the conversation :)

Thanks!

Cheers,

f