Design-by-Committee

Michael Chermside mcherm at destiny.com
Mon May 7 18:58:48 EDT 2001


> David LeBlanc <whisper at oz.nospamnet> wrote in comp.lang.python:
> > XML is, imho, the most important thing that's happened to data since the 
> > advent of computers. We spent roughly 5 decades polishing and structuring 
> > computer languages whilst leaving data pretty much an ad-hoc affair. Now, 
> > with XML, we can create, share and manipulate fine grained data 
> > structures across architectures and programming languages.

Remco Gerlich <scarblac at pino.selwerd.nl> responded:
> Blatant hype. You could already always define a data format for your files
> that other programs could use. Now there is XML, the syntax is standard -
> but what the data in that file *means* still needs to be defined for every
> application. A MS Word file may be in XML soon, but that doesn't mean it's
> any more usable than a binary one now. It probably will be many times bigger.

While it is true that XML is overhyped, I agree totally with David. Let your
mind drift back a few decades...

<camera loses focus, then focuses in on new scene>

Jane: This ASCII thing is the greatest thing that's happened to text data 
  since the advent of the punch card!

Carl: Blatant hype. You could already use EBDIC, BCD, or Hollerith. And better
  yet, you can define a binary format. After all, if you're storing (for
  example) dates, or fields like name or address which can contain only A-Z and
  a couple of special characters like "-", then your filesizes can be
  SIGNIFICANTLY decreased by using an task-specific encoding.

Jane: Yes, but... well, if everyone uses ASCII, then the files will be
  human readable! And they'll be standardized. We'll be able to generate files 
  and pass them back and forth from one program to another and they'll be
  readable.

Carl: Not so. Universal use of ASCII means that the encoding is standard, but
  the format of that text -- what it *means* -- still has to be defined for
  every application. One may use commas to separate values, while another
  uses tabs. It won't be any more portable than a binary file.

<camera fades to black, then returns to the present>

XML has been overhyped... it will never lead us to the magical land where all
data is interchangable, and is perfectly understood by intelligent agents.
Hey... where I work, we pay very smart individuals for thousands of hours of
work just to figure out how to map data from one database to another. 90% of
their time is spent on things like figuring out how to fill in the "vendor"
field when it's not provided but is absolutely required. No automatic
translation program will ever replace them.

However, overhyped though it is, XML is VERY important. Because, like ASCII,
it is fast becoming a universal standard. And though it solves only SOME of
the problems of data translation, recognition, and structuring, it DOES solve
some. When was the last time you worried about what encoding someone used
for their source files? (Probably last time you had \n vs \r\n problems, but
that's a trivial dialect issue by comparison.) And how many times has the use
of text files proved more useful than binary files? The argument about being
many times bigger is patently specious... let's start passing them around
in compressed format and make that irrelevent. XML is not ideal, nor will it
solve everything, but it is terribly powerful if everyone uses it. And they
will.

Fortunately (to return to the topic of this newsgroup) Python is well-prepared 
to enter this world, with excellent XML support (and unicode) in standard 
libararies. All I wish for now is a version of "shelve" which stored the objects 
in an XML file. Hmm... wouldn't be too hard... maybe I should write it. (Any 
interest?)

-- Michael Chermside






More information about the Python-list mailing list