XML overuse? (was Re: Python to XML to Python conversion)

Tue Jul 16 18:14:51 EDT 2002

holger krekel <pyth at devel.trillke.net> wrote:
>Huaiyu Zhu wrote:
>> holger krekel <pyth at devel.trillke.net> wrote:
>> >Huaiyu Zhu wrote:
>> >> Readability for machines does not have to come at the expense of readability
>> >> for humans.  A few years back I experimented with an indentation based data
>> >> format that is:
>> >> 
>> >> - as readable as emacs's outline mode
>> >> - reduce to common conventions like this paragraph for simple cases
>> >> - allow mixed nested structures of set, sequence, dictionary, and seqdict
>> >> - can include binary data 
>> >> - can handle different encodings/encryptions in different elements
>> >> - with average less than 5% bloat, in contrast to XML's over 100% bloat
>> >
>> >do you have any code or design documents for this?  
>> >
>> >Sounds quite interesting.
>> 
>> The basic idea is quite simple: consider a data structure as a tree; denote
>> the type of branching at each node; indent the subtrees.  It appears to me
>> that indentation is easier to handle than quotes and escapes.  Here's a
>> simple example:
>>
>> ...snipped...
>>
>> OK, hope this makes sense.
>
>It does and it's very interesting.  It does sound a lot like 
>http://yaml.org to me, though  (They even have an RFC).
>Don't you think YAML might be a superset of your ideas?

Thanks a lot for this link.  The basic idea is very similar, but apparently
they have done a lot more of formal specification than I have ever
attempted.  There are several differences in the details, so neither is
superset of the other.  I'll comment on the differences once I have time to
read through their docs.

>Let me add some random thoughts/questions about your/yaml's scheme 
>(i hope i am not missing something obvious):

Following comments only concern what my scheme does:

>- how is a binary data-stream's size determined? What about
>  open-ended streams?  Embedding of arbitrary data-streams
>  is very useful (IMO).

It's determined by the block structure denoted by indentation.

>- somehow your and yaml's scheme remind me of todays wiki techniques.  
>  E.g. Wikis have methods of sequence-detection (bullets ...) and they
>  have a commitment to readability. Of course, they are generally more 
>  concerned with graphical views than with beeing a concise persistence scheme.  

The emphasis is on using indentation and leading markers to denote
structure, in contrast to markups, puctuations, quotes and escapes in the
markup languages.

>- Is there a canonical conversion between XML and your scheme/YAML?
>  Shouldn't be too hard, anyway...

In principle they both can express anything.  In practice I've never tried
conversion between my scheme and XML.  XML is too complicated in some sense:
restriction to texts, white spaces, quotes, tags with names, specs of
repeats, etc.  I do not know if there is a syntax-free specification of XML
data structures.  In my scheme, the syntax comes after the abstract
structures are specified.  Structure marks are never buried in data.

>- how do you express external addresses akin XPATH? 
>  Ideas:
>    - Mappings are easy, just take the 'key'. 
>    - Sequences are easy (take the sequence number) but not very robust
>      to deletions and insertions of items.
>    - tag-names (IDs) which can be associated with any item might be interesting.
>      readability is likely to suffer, probably.

An address is not a data structure, but a particular data item.  It has no
meta-meaning in the scheme.  I've experimented with alias nodes, sort of
like symbolic links in file systems.  I found life is easier without them.
I also believe that one's document's meta data is another's plain data.

>btw, I wonder whether some form of your and/or YAML's ideas should play a
>role in the new persistence-SIG.  While the actual persistence mappings 
>are not in the focus there are certainly some interesting connections 
>between the two areas.

There are facilities for conversion among the data structures: set, seq,
dict, seqdict, with various specifications.  I do not see how yaml indicates
the types of structures.

>>  If this is still interesting I'll dig the thing
>> out.  I have documents and code (perl and python) at home, but I'll have to 
>> ...
>
>this sure is useful. Especially for me since i work with a (perl-) 
>friend on a project which needs to address the persistence-question. And
>we want to have it interoperable, simple and fast.  I guess looking
>at YAML might avoid that you have to dig too much into old harddisks :-)

Yaml is very interesting.  I'd say aobut 60-80% similar to what I did.  I'm
sure I have stuff that they don't have.  I'll dig up my stuff anyway.  If
anyone is insterested in seeing a mess of code and doc ... :-)

I started with bibtex, todo lists, etc, as I had a problem keeping track of
my things.  Then it drifted and I got distracted and eventually even lose
track of this project itself.  The hype on XML got me really depressed, as I
thought no one would be interested in a direction I regard as fundamentally
better than XML.  Any new development that can make my life easier is very
welcome.

Huaiyu