XML overuse? (was Re: Python to XML to Python conversion)

Huaiyu Zhu huaiyu at gauss.almadan.ibm.com
Thu Jul 18 14:10:40 EDT 2002


Clark C . Evans <cce at clarkevans.com> wrote:
>On Tue, Jul 16, 2002 at 10:14:51PM +0000, Huaiyu Zhu wrote:
>| Thanks a lot for this link.  The basic idea is very similar, but apparently
>| they have done a lot more of formal specification than I have ever
>| attempted.  There are several differences in the details, so neither is
>| superset of the other.  I'll comment on the differences once I have time to
>| read through their docs.
>
>I look forward to the commentary, could you do it or cc the 
>YAML discussion list?

That'll be after I get time to read through YAML docs and review my old
code and docs.
    
>| The emphasis is on using indentation and leading markers to denote
>| structure, in contrast to markups, puctuations, quotes and escapes in the
>| markup languages.
>
>Exactly.   We started with leading markers (% and @ initially) and
>eventually found ways that allowed us to skip these...

How like minds think alike. :-)   Perl opened my mind to the possibility of
heterogeneous hierarchical data structures.

>I'd love to hear about the overlap; I'm sure we don't do everything.
>But if you found something important that we don't have, I'd love to
>know since we'd like to start finalizing the spec at this time so that
>implementations can start emerging.
>
>I'd love to hear more about your thoughts on YAML, and if possible,
>we'd really welcome your participation!

I'll try to find time to participate, but time is always in short supply.  

Here are some comments at first glance.  I don't see a description of the
semantics of the structures independent of any syntax.  It is possible to
define all the canonical transforms among the structures [1] without
concerning any particular representation.  I'd also like to emphasize that
all the indentations, markers etc should be configurable in a document[2][3].

[1] Canonical transforms, such as {a, b, c} -> [a, b, c] -> {(1:a), (3:c),
	(2:b)}.  There are a few dozens of them among set, seq, dict, seqdict.
	Some have partial inverse.  None of them are one-one correspondence.
	That's why I let all these four as basic structures.  These four are the
	combination of keyd/nonkeyed ordered/unordered.  Additional kinds of
	structures, such as bags (whether keyed and whether ordered), may be
	added later on. [4]

[2] I tried the following kinds of indentations (where n is level)
	  '(%s)' % n
	  '  ' * n
	  '  ' * n + '|'
	Obviously there can be a lot of other variations.  Such flexibility
	would allow many common document formats to be transformed into
	conforming format with minimum effort, sometimes by just adding a
	metacomment at the beginning of the document.  For example, the formats
    of the current paragraphs should be accommodated.

[3] I would allow encoding and encryption to be allowed at a per node
	basis, not just at the file level.  In reality how to break up a tree
	into subtrees to fit in files is largely arbitrary.  This calls for meta
	comments on each node with a simple syntax for describing them.

[4] One thing I have not solved is whether the keys can only be strings.  If
	keys can be substructures themselves, there are further correspondence
	between sets, dicts and bags, such as {a, b} -> {a:1, b:1}.  This leads
	to the issue of the identity of structures.  Example: {a, b}=={a} if
	a==b.  This complicates things and that's perhaps where I stopped.
	(Over-generalization perhaps?)

So my overall comment is that this approach can be made more 'meta' than any
particular syntax or structure would allow.  The worst thing about xml is
that one has to conform to its (mostly arbitrary) syntax conventions instead
of thinking about the underlying data structure that's pertinent for the
task at hand.  I do believe that the good thing about standards is there are
so many to choose from.  A meta syntax would open up the possibility of
interoperability on a much larger scale than xml could handle comfortably.
It is often easier to define a particular syntax by fixing some parameters
in a meta syntax.  Perhaps these are already in yaml since I had only a half
hour reading of its docs.

Huaiyu



More information about the Python-list mailing list