[Tutor] Machine Vs. Human Parsing (Was: XML: Expletive Deleted)(Way OT!)

Alan Gauld alan.gauld at freenet.co.uk
Fri Jun 16 21:37:56 CEST 2006


> > Almost anything beats the human eye IME :-)
> > Actually if you must use eyes do so on a hex dump of the file, 
> > that
> > is usually reliable enough if you can read hex...
> If I gave the impression that the human eye is the only useful means 
> of
> examining and verifying stored data, I apologize.  I indented to say
> that the human eye, and the brain that goes with it, is an 
> invaluable
> tool in evaluating data.  I stand by that statement.
>
> The most sophisticated tool is only as good as the developer(s) who
> made it.

Of course and thats why i said I trusted a *tested* tool. You have to
be sure the tool is functioning correctly before you can use it 
confidently.

> Since software is ultimately written by humans

Not always, but that's a whole other OT debate :-)

> You look at the data.

I agree, thats part of testing/debugging the tool.  Most specifically
you find known data, whether that be the data causing the problem
or a reference data source with which you can compare.

> A case in point.  I used to test audio subsystems on PC 
> motherboards.
> ...
> I used an audio editing program to display the waveform

ie a Tool.

> The  editing SW was no help there.  So I switched tools,
> and displayed the capture buffer with simple file dump program.

Again a tool. Why because your eyes couldn't parse the audio data.
Your eyes could parse the tool output which was intended for human
consumption.

> The story goes on, but that's enough to illustrate my point.

And mine. You needed the right tools to interpret the raw data
in the file before your eyes and brain could diagnose the problem.
The issue I was raising was that textual data on disk is
notoriously difficult to accurately interpret. Once it has passsed
through a tool that presents it in an unambiguous format the
human eye is entirely appropriate.

> the audio driver, nor the test SW, nor the editing tool could
> show the real problem.  All of them were 'tested tools',

The need for the *right* tool does not obviate the need for a tool
designed to interpret the raw data. Arguably the editing SW did
highlight the real problem - the corruption at the start of the file.
Its representation was good enough to poiunt you at the hex editor.

> data they were not designed to handle, they produced incorrect
> or incomplete results.

Just like the human eye when reading raw data out of a text file!

> I could cite other examples from other disciplines,
> but this one suffices:  no SW tool should be relied upon to be 
> correct
> in all cases.

Indeed, diagnosing data problems is always fraught with difficulty.
Its one of the hardest problem types to track down. My point was
merely that the human eye is not reliable when reading raw text files
and other, better tools exist. I did not mean to imply that any magic
tool exists that works for any kind of data, but the eye is no better
than any other kind of tool in this regard. It is very good when
presented with data that is formatted for it but not si good when
presented with "nearly right" data.

> I trust my eyes to see things tools can't

And I trust tools to see what my eyes can't!
Neither is infallible and I need both. If I'm looking at a text file 
I'll fire
it up in a text editor first for the obvious checks. But if it looks
OK I'll then either load it in a hex editor or some other tool 
designed
to read that data format.

> when necessary, but I usually just print out '\t', '[SP]', etc)

I'm curious how you do  that? Presumably a tool?
(Not joking, how do you get that kind of display without a specialist 
tool?)

> and incorrect.  The programmer's two most valuable tools are her/his
> eyes and brain.  They are always useful, sometimes indispensable.

I completely agree (see my chapter on denbugging in my paper
book for a confirmation of that claim!). I was simply pointing out 
that
belief that text files are somehow easy to analyse with the human
eye is a dangerous fallacy, and I've spent enough man-days
debugging plain-text to be very, very wary. OTOH I detest the
modern tendency to use binary formats where plain text makes
more sense such as in configuration files! But thats a very different
use than for bulk data.

Alan G. 




More information about the Tutor mailing list