[XML-SIG] ANN: SLiP and SLIDE - a quick XML shorthand syntax and tool for editing

BudP.Bruegger BudP.Bruegger
Mon, 19 Aug 2002 19:28:19 +0200


(A little late,) I have noted the announcement of SLiP and the followup
discussion on XML shorthand on this list.  Have you guys followed up on
topic and are working on a joint specification/implementation?  I would
interested to join in.  In the following I describe some ideas.  

In 1999, I did some work on an XML shorthand syntax in the context of a
successor of Ian Clatworthy's SDF (see
http://www.cpan.org/modules/by-authors/id/IANC/sdf-2.000.readme).  Some
of the
ideas may be quite applicable to the discussion here.  Here is a
modernized/simplified version (that takes your postings into account):

[see also examples below]

I think of the XML document in terms of DOM nodes where nodes are either
elements, attributes, text, or comments.  (Maybe also processing
entities, etc. should be added). 

Generally, each node starts on a line of its own and is often contained
in a
single line.  (Note that PYX uses a similar approach, see for example
http://www.xml.com/pub/a/2000/03/15/feature/index.html).  This makes
easy, makes the structure more explicit, and possibly allows ad-hoc
using tools such as grep and similar.  Most importantly, however, this
makes it also much easier to mix text nodes with element nodes such as
in the
following example: "this is a <strong>bold</strong>word".  

The hierarchy of nodes is defined by (pythonic) indentation.

An element has the format 

An attribute has the format
(WS=white space without newline)

A comment has the format
(WS=white space without newline)
Note that this does not seem to conflict with the use of '#' in URLs.

A text node has the format 

or on multiple lines:


  Note that occurrences of '"' or '"""' after the opening '"' or '"""'
  be escaped with '\'.  Similarly, the special characters "*", "@", "#",
  "%" as first characters of a multi-line text need to be quoted with
  Also, it is on purpose that the multi-line text starts only on the
line below
  the '"""'--I like the indentation better, particularly for
  Note also that instead of (single or tripple) double quote characters
  single quote characters ("'") could be used equivalently.  
  Also, the indentation is stripped off in the equivalent XML document. 
  prevents multi-line (possibly pre-formatted) text to break the visual
  structure of the document.  

For rapid authoring, a text node is not really a strict text node but
automatically should quote the common character entities.  For example
"1 > 0"
really represents "1 &gt; 0".  

Also, in single-line text, if the line breaks need to be expressed
using '\n' to represent line breaks is highly beneficial (see example 2
If entity substitution (quoting) is not desired, and for many other
variations of the plain text node can be used.  They have the following


or in the case of multiple lines:

<indentation>%<textType><optionalWS>""" ...

While the approach really offers unlimited possibilities, I have thought
of the following possible text types:

  * normal:  i.e., equivalent to stating no %<textType>
  * raw:     no quotation or different substitution.  This allows also
to embed
	     xml into the shorthand document
  * structuredText or st:  This makes it easy to write lists, emphasized
  * stNG:    same in different version
  * sh:      the resulting text returned by some shell command
  * py:      the resulting text returned by some python expression
  * incl:    inclusion of the content of an external file (could be done
  * img:     translates to an html image element but smartly computes
             changes format, creates thumbnail, or similar.  
  * table:   some smart way of making tables (see for example the
  	   approaches in SDF that were quite easy and successful--I
always used
  	   a fixed format one)

Obviously, there are unlimited possibilities and the shorthand package
come with a small predefined library and a simple mechanism to add ones
own. (I
implemented a web template system many years ago in perl that used the
approach very successfully).  While I haven't thought this through in
the text processing code could either expect a certain formatting
(as in ordered lists or tables) and/or use attributes that are
associated with
the text node.  (Note that this is an extension of the DOM model where
nodes cannot have attributes).  

Some useful shortcuts:
  While I proposed to start each DOM node on a separate line, here is a
  possible exception that makes it possible to be terser (but makes it
  impossible to analyse the document using easy grepping).  In the case
  an element has no attributes and contains a single, single-line text
  the text can be added to the same line as the node:
Similarly, in
  the case of a multi-line text following an attribute less element, one
  write: <indentation>*<optionalNS:><elementName><optionalWS>"""
  <incrementedIndentation><firstTextLine> ...

The implementation seems rather straight forward.  In particularly, the
type of
node can be detected by a parser looking simply at the beginning of each

One source of ambiguity that has to be solved is how to differentiate
elements from non-empty elements without content.  For example, how
should the
shorthand differ for <empty/> and <nonempty></nonempty>?  One possible
would be to use "*/" instead of "*" to prefix empty elements.  

While converting xml to shorthand is trivial, some more challenges are
to be
expected when going from xml back to shorthand.  There is an ambiguity
of what
types of text to use and how to chose.  In case it is possible to come
up with
simple rules for the kinds of text to chose (always normal except for
and TABLE elements), it is easy.  Normal text may optionally be wrapped
differently.  Since I propose an open framework, the mapping is not
always that well defined.  Maybe each text module needs to define some
for when it can be applied???

One strength of the approach seems that--in the case of document centric
xml--it can precisely preserve line breaks and white space that, for
make a difference in how web browsers render (x)html.  

Anyhow, I hope this is of interest and I would be happy to discuss more
participate in an implementation.  

kind regards


----------- example 1 -------------------

  # "this is an example of my xml shorthand ideas"
    @type "home"
    *street "123 Sesame Street"
    *city "Wonderland"
    *state "CA"
    *zipCode "90012"
    *comment """
      Please leave packages with Grouch in
      garbage can next door."""

----------- example 2 -------------------

  @attr1 "cool"
  @attr2 'moose'
    "Some words are "
    *strong "bold"
    " and some are not.\n"
    note that this is a single line consisting of
    two text nodes that surround a _strong_ element"""
    It would be rather more cumbersome to write

    * nested unordered lists
       * such as this
       * and others
    * and odered lists or tables

    without the use of _structured text_.'''
    *someNestedElement '''
      note that the flexible use of single or double quote characters 
      makes quoting of " and ' easier even when they are trippled as in
      """ or \'''.'''
      "this is equivalent to*"
    *foo "this format here"
       this is the same (format) equivalence
       example on multiple lines"""
    *note """
       with two lines
       of text here"""

| Bud P. Bruegger, Ph.D. 
| Sistema (www.sistema.it)
| Via U. Bassi, 54
| 58100 Grosseto, Italy
| +39-0564-411682 (voice and fax)