[XML-SIG] Re: PyDoc/XML?

uche.ogbuji@fourthought.com uche.ogbuji@fourthought.com
Wed, 29 Sep 1999 10:58:10 -0400


[Note: you might want to suggest your ideas to the doc-sig, which, as Fred 
Drake pointed out to me, is also relevant]

> Exactly. That's _my_ question. I guess you are confirming my hunch that XML
> has some limitations for creating markup-languages that are 'friendly' and 
> 'readable' enough for humans to manipulate. 

And many very intelligent people, including yourself, have complained about 
this.  I think it's a one man's meat is another man's poison sort of thing.  I 
find XML as readable as any other presentation of such structure.  Some say 
that optional end-tags, etc. would help, but I often find the un-closed "<LI>" 
and "<P>" in HTML a bit more confusing.

I don't think anyone will ever come up with a markup-language that's 
universally readable and friendly, because the essence of a markup language is 
to make clear distinction between data and mark-up.  This distinction is 
either bound to cause impaired readibility if the distinction is too sharp, or 
structural confusion if the distinction is too blurred.  My first impression 
of RML is that it tends toward the latter.

> Years ago I created a markup language called RML (Report Markup Language)
> to allow documents to be marked up as 'boiler plate' for automatic report
> generation. The markup language was very flexible and both declarative and
> procedure in nature (you could choose mostly declarative or mostly 
> procedural, or anything in between). It was very friendly because you could
> use WYSIWYG features of the word processor (DecWrite) as markups, or you
> insert RML markups (.BOLD .CENTER etc). The RML parser read and interpreted
> the document's structure along with the RML markups.
> 
> One of the first issues I had to resolve was this: Is RML embedded in the
> document or is the document embedded in RML? The answer is: neither. A marked
> up document is a _union_ of two languages, RML and the native structure of
> the document. Together they form a single language which can be interpreted
> and used to generate virtually any kind of document.

I think this depends greatly on the application.  Im most cases where XML is 
used the mark-up _is_ the structure of the document.  I know that the SGML 
world has some powerful models for incorporating non-SGML markup aspects into 
a document or incorporating SGML into non-markup data.  I just learned enough 
about groves to be dangerous from Paul Prescod's _brilliant_ precis on the 
XML-Dev list and the ensuing conversation, but I would venture to say that the 
SGML parts would simply be nodes conforming to the SGML property-set (element, 
attribute, content, etc.), and one could determine a property-set for the 
non-SGML parts of the document, and combine them richly in the sort of way you 
are proposing.  However, I don't think the XML community has embraced groves, 
and there's not too much support in the prevalent standard for mixed-mode 
documents except to extract the separate layers and process them separately.

> RML has evolved beyond DecWrite and report generation and is now a scripting
> language superficially (quite coincidentally) resembling Python (without the
> nasty indenting ;)

Another aesthetic concern in which we disagree.  Again, one man's sugar is 
another's arsenic.  The clean look that results from the indentation rules is 
one of the things that drew me to Python.

> I have written a simple XML parser for RML, attempting to conform to the 1.0
> Spec for well-formedness. (It also does HTML as a selectable option). I use
> the parser exclusively through a SAX-like event interface. (Will do some 
> DOM stuff eventually). 

Is this so that you can insert RML markup (".BOLD", etc) into XML docs?  
Vice-versa?  Both?

> Going back to my insight that a marked-up document is a union of two
> languages.

In some cases.

> A programming language source file is an ideal candidate for my kind of RML
> processing because both languages are highly structured and machine readable.
> 
> My goals for producing reference documents from source files are:
> 1. Avoid marking up anything that can be deduced by syntactic recognition
>    of the language itself. Thus it is foolish to make lists of function names
>    variables, data types etc since that can all be figured out with more or
>    less perfect accuracy by a parser.
> 2. Look for markups in the comments that reflect some aspect of the document
>    organization (Author, dates, purpose) Usually a set of pretty tags would
>    be helpful.
> 3. Automatically (or with slight assist from a pretty tag) recognize free-form
>    comments that describe the functionality of the code and organize them
>    in some sensible way. Allow structural tags to be embedded in this
> free-form
>    stuff.

I disagree that there is _any_ way to mark-up code that won't get in the way 
of some programmers (such as myself).  That is why I advocate rich linking 
from code to separate documentation.

<snip> 

> I would like to use tokens like 'AUTHOR:' and 'FILE:' as actual markups
> mixed with other free-style comments.
> 
> So this is the auto-doc markup language for creating manuals out of source
> code that I have in mind. And I want XML to play a role here, but not
> clear what that role is. (A 'solution' in search of a 'problem') It looks
> like using XML to create the actual markup language is not possible,
> rather I will need an external program to extract/create the structure
> and semantics of the goal document, then use XML as a representation language,
> from which HTML, PostScript, RTF, or whatever can be generated. That sounds 
> feasible.

This makes most sense for your purposes.

> I am trying to learn XML by developing my own tools. This has introduced
> me to some of the more subtle aspects of XML and has caused me to revise
> my opinion of what XML is. (Actually, I think my XML 'evangelists' don't
> really know exactly what XML is and are abusing it by proposing it for
> object databases and other somewhat mis-appropriate uses). I see XML
> strictly as a way of marking up a 'document' to expose its structure and
> semantics. Sure, documents are trees, like databases, but doesn't necessarily
> imply XML is a good way to implement a database. (Basic necessities such
> as query languages don't exist, yet).

Hmm.  Anyone can go about the villages and try selling XML as the cure for all 
ills, but I think you'll find that most "evangelists" who know what they're 
talking about do not claim XML for more than it is: a standard mark-up 
language with roots in a much more sophisticated and complex ancestor (SGML) 
with a very fast-growing community.  This last bit is key.  There is nothing 
magic about angle-bracket mark-up or any of that.  XML's magic comes from the 
fact that so many people are using it and that so many tools are emerging for 
it.

I haven't heard anyone claim that XML should replace _any_ binary data-store, 
such as object-database back-ends, although I have heard people advocating XML 
as a serialization between such back-ends, and I think this is an excellent 
idea.  BTW, you can do basic queries on XML using XPath.  It doesn't allow for 
combanitorial queries or any of that: that's what XQL is being developed for, 
but it's something.

Finally, XML is about more than just documents.  Data-serialization usually 
doesn't have much in common with documents, but XML works pretty well in that 
space.

Be well.

-- 
Uche Ogbuji
FourThought LLC, IT Consultants
uche.ogbuji@fourthought.com	(970)481-0805
Software engineering, project management, Intranets and Extranets
http://FourThought.com		http://OpenTechnology.org