Paul Boddie paul at boddie.net
Mon Jun 23 11:42:49 CEST 2003

martin at v.loewis.de (Martin v. Löwis) wrote in message news:<m3wuffi1x7.fsf at mira.informatik.hu-berlin.de>...
> But your observation is right in principle: to really understand what
> the "most common text format in history" is, and to allow "7-bit plain
> text ASCII" as a candidate, one would need to specify, in more detail,
> what exactly that is.

Indeed. The assertion that "7-bit plain text ASCII" is even a
meaningful format is highly dubious, at least when it comes to
understanding the information presented in that format. Despite
wishing to avoid "comp.sys.acorn.misc" levels of stating the obvious
for the lack of a more interesting subject to discuss, it is fair to
say that claiming the suitability of that "format" because "there are
a lot of parsers which use it" is analogous to telling a Japanese
person that they only need to know the Latin alphabet to be able to
understand most Western languages.

There are some pretty solid reasons for choosing XML to represent
data. Firstly, the basic aspects of the data are standardised beyond
the character encoding employed - in fact, the "plain ASCII" arguments
frequently neglect or understate the obvious internationalisation
issues that come with rolling your own format, whereas elementary
things such as character encodings *are* covered by the XML standards.
Secondly, the standardisation of how the structure of XML documents is
represented means that one doesn't necessarily need to be concerned
about the mechanics and edge cases of how this is achieved - if one
were to develop one's own syntax, one would need to spend time
debugging the parser and making sure that potential ambiguities have
been removed, whereas all this work has been done for XML already.
Finally, it can often be a distraction to focus on the textual form of
XML documents - the various APIs are arguably the most interesting
part of the whole XML movement, albeit a part which is usually made
possible by the standardisation of the textual representation.

One interesting related exercise is to download an RFC for something
like the iCalendar format and find all the places where the formatting
of the data is explicitly mentioned. Then, consider how much of that
elaboration would be made superfluous if they employed XML instead.


More information about the Python-list mailing list