[Tutor] XML: Expletive Deleted (OT)

Alan Gauld alan.gauld at btinternet.com
Fri Jun 16 01:05:41 CEST 2006

Just picked this up after being out for most of the week...

"Carroll, Barry" <Barry.Carroll at psc.com> wrote in message

> One reason to for choosing a human-readable format is the desire to
> visually confirm the correctness of the stored data and format.

Thats a very dangerous asumption, how do you detect unprintable
characters, tabs instead of spaces, trailing spaces on a line etc etc.
Whole text representations are helpful you should never rely on the
human eye to validate a data file.

> can be invaluable when troubleshooting a bug involving stored data. 
> If
> there is a tool between the user and the data, one must then rely 
> upon
> the correctness of the tool to determine the correctness of the 
> data.

Or the correctness of the eye. I know which one i prefer - a tested 
The human eye is not a dta parser, but it flatters to deceive by being
nearly good enough.

> In a case like this, nothing beats the evidence of one's eyes, IMHO.

Almost anything beats the human eye IME :-)
Actually if you must use eyes do so on a hex dump of the file, that
is usually reliable enough if you can read hex...

> In their book, "The Pragmatic Programmer: From Journeyman to Master"
> (Addison Wesley Professional), Andrew Hunt and David Thomas give 
> another
> reason for storing data in human readable form:
>    The problem with most binary formats is that the context 
> necessary
>    to understand the data is separate from the data itself. You are
>    artificially divorcing the data from its meaning. The data may
>    as well be encrypted; it is absolutely meaningless without the
>    application logic to parse it. With plain text, however, you can
>    achieve a self-describing data stream that is independent of the
>    application that created it.

But at a very high risk. I do not dislike text files BTW and am not
suggesting that text should not be used but its parsing is best left
to machines, the eye is only a rough and unreliable guide.

And if your data volumes are high go with binary, you'll need tools
to parse a lot of data anyway, you might as well save the space!

The Hunt/Thomas book is excellent BTW - I recommend it highly.
Even though I disagree witrh several of their suggestions(*) I agree
with far more.

(*)They recommend sticking with one text editor whereas I use
about 5 or 6 on a regular basis depending on the job I'm doing and
the platform I'm working on. Emacs on X Windows for new files
but vim for quick fix ups, vim on Windows for most things,
ed or ex for text based email or over a phone line.

> This is an example of the resource balancing act that computer 
> people
> have been faced with since the beginning.  The most scarce/expensive
> resource dictates the program's/system's design.  In Alan's example 
> high
> speed bandwidth is the limiting resource.  A data transmission 
> method
> that fails to minimize use of that resource is therefore a bad 
> solution.

Unfortunately the software industry is full of people who by and large
don't understand networks so they just ignoire them. At least thats
my experience! SOA using SOAP/XML is probably the most inefficient
and unreliable set of data networking technologies you could possible
come up with. But the focus is on cutting developer cost because the
people inventing it are developers! In almost every sizewable project
the cost of development will be significantly less than the cost of
deployment - in most of my projects it usually works out something 

development - 15%
deployment - 30%
support - 15%
training - 25%
documentation - 5%
management overhead - 10%

Saving 25% of development costs rediuced total cost by around 4% but
if that puts deployment costs up by 10% the net gain is only 1%!
And in XML case it often puts deployment costs up by 100%
- a net loss of  24%!!
Now those figures come from a typical project that I work on which
probably has a total budget of betwen $10-100 million. If your
budget is smaller, say less than $1 million then the balance may
well change. But over 50% of the IT industry works on projects
with  >$1m budgets according to both Datamation and Infoweek.

[ The only SOA/XML book that addresses this side of XML usage
is the excellent "SOA - A Field Guide" by Peter Erls. Erls also
suggests some mitigating strategies to get round it.]

> So here's my off-topic question: Ajax is being touted as the 
> 'best-known
> method' (BKM) for making dynamic browser-based applications, and XML 
> is
> the BKM for transferring data in Ajax land.  If XML is a bad idea 
> for
> network data-transfer, what medium should be used instead?

The example I gave of having to upgrade the sites network was actually
an early adopter of XML/Ajax architecture! There are lots of other 
formats around - some are even self describing (CSV and TLV are cases)
Others simply hold the definition in an accessible library so you only
have to transport it once - eg IDL and ASN.1 - or optionally compile 
into your code for maximum efficiency. ASN./1 is typically around 50
times more compact than XML - that makes a big difference over a
DSL line and even bigger over dial-up modem or GPRS link....

XML has its place but that place has to be guaged very carefully.
Unfortunately I don't see much acknowledgement of its weaknesses
in the industry at large, instead it is being deployed with wild 

Alan g. 

More information about the Tutor mailing list