[Tutor] the art of testing

Dave Angel davea at ieee.org
Wed Nov 25 04:41:32 CET 2009

Serdar Tumgoren wrote:
>> That's a good start.  You're missing one requirement that I think needs to
>> be explicit.  Presumably you're requiring that the XML be well-formed.  This
>> refers to things like matching <xxx>  and </xxx> nodes, and proper use of
>> quotes and escaping within strings.  Most DOM parsers won't even give you a
>> tree if the file isn't well-formed.
> I actually hadn't been checking for well-formedness on the assumption
> that ElementTree's parse method did that behind the scenes. Is that
> not correct?
> (I didn't see any specifics on that subject in the docs:
> http://docs.python.org/library/xml.etree.elementtree.html)
I also would assume that ElementTree would do the check.  But the point 
is:  it's part of the spec, and needs to be explicitly handled in your 
list of errors:
     file xxxxyyy.xml  was rejected because .....

I am not saying you need to separately test for it in your validator, 
but effectively it's the second test you'll be doing.  (The first is:  
the file exists and is readable)
>> But most importantly, you can divide the rules where you say "if the data
>> looks like XXXX" the file is rejected.   Versus "if the data looks like
>> YYYY, we'll pretend it's actually ZZZZ, and keep going.  An example of that
>> last might be what to do if somebody specifies March 35.  You might just
>> pretend March 31, and keep going.
> Ok, so if I'm understanding -- I should convert invalid data to
> sensible defaults where possible (like setting blank fields to 0);
> otherwise if the data is clearly invalid and the default is
> unknowable, I should flag the field for editing, deletion or some
> other type of handling.

Exactly.  As you said in one of your other messages, human intervention 
required.  Then the humans may decide to modify the spec to reduce the 
number of cases needing human intervention.  So I see the spec and the 
validator as a matched pair that will evolve.

Note that none of this says anything about testing your code.  You'll 
need a controlled suite of test data to help with that.  The word "test" 
is heavily overloaded (and heavily underdone) in our industry.


More information about the Tutor mailing list