XML overuse? (was Re: Python to XML to Python conversion)
Huaiyu Zhu
huaiyu at gauss.almadan.ibm.com
Tue Jul 16 18:14:51 EDT 2002
holger krekel <pyth at devel.trillke.net> wrote:
>Huaiyu Zhu wrote:
>> holger krekel <pyth at devel.trillke.net> wrote:
>> >Huaiyu Zhu wrote:
>> >> Readability for machines does not have to come at the expense of readability
>> >> for humans. A few years back I experimented with an indentation based data
>> >> format that is:
>> >>
>> >> - as readable as emacs's outline mode
>> >> - reduce to common conventions like this paragraph for simple cases
>> >> - allow mixed nested structures of set, sequence, dictionary, and seqdict
>> >> - can include binary data
>> >> - can handle different encodings/encryptions in different elements
>> >> - with average less than 5% bloat, in contrast to XML's over 100% bloat
>> >
>> >do you have any code or design documents for this?
>> >
>> >Sounds quite interesting.
>>
>> The basic idea is quite simple: consider a data structure as a tree; denote
>> the type of branching at each node; indent the subtrees. It appears to me
>> that indentation is easier to handle than quotes and escapes. Here's a
>> simple example:
>>
>> ...snipped...
>>
>> OK, hope this makes sense.
>
>It does and it's very interesting. It does sound a lot like
>http://yaml.org to me, though (They even have an RFC).
>Don't you think YAML might be a superset of your ideas?
Thanks a lot for this link. The basic idea is very similar, but apparently
they have done a lot more of formal specification than I have ever
attempted. There are several differences in the details, so neither is
superset of the other. I'll comment on the differences once I have time to
read through their docs.
>Let me add some random thoughts/questions about your/yaml's scheme
>(i hope i am not missing something obvious):
Following comments only concern what my scheme does:
>- how is a binary data-stream's size determined? What about
> open-ended streams? Embedding of arbitrary data-streams
> is very useful (IMO).
It's determined by the block structure denoted by indentation.
>- somehow your and yaml's scheme remind me of todays wiki techniques.
> E.g. Wikis have methods of sequence-detection (bullets ...) and they
> have a commitment to readability. Of course, they are generally more
> concerned with graphical views than with beeing a concise persistence scheme.
The emphasis is on using indentation and leading markers to denote
structure, in contrast to markups, puctuations, quotes and escapes in the
markup languages.
>- Is there a canonical conversion between XML and your scheme/YAML?
> Shouldn't be too hard, anyway...
In principle they both can express anything. In practice I've never tried
conversion between my scheme and XML. XML is too complicated in some sense:
restriction to texts, white spaces, quotes, tags with names, specs of
repeats, etc. I do not know if there is a syntax-free specification of XML
data structures. In my scheme, the syntax comes after the abstract
structures are specified. Structure marks are never buried in data.
>- how do you express external addresses akin XPATH?
> Ideas:
> - Mappings are easy, just take the 'key'.
> - Sequences are easy (take the sequence number) but not very robust
> to deletions and insertions of items.
> - tag-names (IDs) which can be associated with any item might be interesting.
> readability is likely to suffer, probably.
An address is not a data structure, but a particular data item. It has no
meta-meaning in the scheme. I've experimented with alias nodes, sort of
like symbolic links in file systems. I found life is easier without them.
I also believe that one's document's meta data is another's plain data.
>btw, I wonder whether some form of your and/or YAML's ideas should play a
>role in the new persistence-SIG. While the actual persistence mappings
>are not in the focus there are certainly some interesting connections
>between the two areas.
There are facilities for conversion among the data structures: set, seq,
dict, seqdict, with various specifications. I do not see how yaml indicates
the types of structures.
>> If this is still interesting I'll dig the thing
>> out. I have documents and code (perl and python) at home, but I'll have to
>> ...
>
>this sure is useful. Especially for me since i work with a (perl-)
>friend on a project which needs to address the persistence-question. And
>we want to have it interoperable, simple and fast. I guess looking
>at YAML might avoid that you have to dig too much into old harddisks :-)
Yaml is very interesting. I'd say aobut 60-80% similar to what I did. I'm
sure I have stuff that they don't have. I'll dig up my stuff anyway. If
anyone is insterested in seeing a mess of code and doc ... :-)
I started with bibtex, todo lists, etc, as I had a problem keeping track of
my things. Then it drifted and I got distracted and eventually even lose
track of this project itself. The hype on XML got me really depressed, as I
thought no one would be interested in a direction I regard as fundamentally
better than XML. Any new development that can make my life easier is very
welcome.
Huaiyu
More information about the Python-list
mailing list