Hardware take on software testing.

Paddy McCarthy paddy3118 at netscape.net
Sat Jun 7 16:24:09 CEST 2003


First i'd like to state that I too like the ideas mentioned in XP and
TDD. If the HW design ideas are outside current software methodologies
then I'd like to debate whether adding them would be benificial.

my comments are interspersed in Peters mail below...

Peter Hansen <peter at engcorp.com> wrote in message news:<3EE14F0A.3D1E1887 at engcorp.com>...
> Paddy McCarthy wrote:
> > 
> > Peter Hansen <peter at engcorp.com> wrote in message news:<3EE0CF2B.57F474E4 at engcorp.com>...
> > > ... a new approach to design, testing, and coding, called Test-Driven
> > > Development (TDD).
> > 
> > On TDD when do you know you are done?
> 
> Oh, *good* question!  <grin>
> 
> > In the Hardware development process we graph the number of bugs found
> > over time and end up with an S curve, we also set coverage targets
> > (100 percent statement coverage for executable statements is the
> > norm), and rather like the TDD approach of software, some teams have
> > dedicated Verification engineers who derive a verification spec from
> > the design spec and write tests for the design to satisfy this,
> > (independantly).
> 
> With TDD, only one
> test at a time is even written, let alone passed by writing new code.
> You would "never" write two tests at the same time since you wouldn't
> necessarily know whether you needed the second test until you had
> written the first.
If I remember correctly, XP advocates a very close working
relationship with the customer with quick customers feedback on
program development.
> 
> > If TDD uses no random directed generation, then don't you tend to test
> > strictly the assumed behaviour?
> 
> Bob Martin wrote "The act of writing a unit test is more an act of 
> design than of verification.  It is also more an act of documentation 
> than of verification.  The act of writing a unit test closes a remarkable 
> number of feedback loops, the least of which is the one pertaining to 
> verification of function."
I was trying to learn more about how these tests are written. Writing
tests is better than not writting them, but I was trying to show a
possibly different way of writing tests, to the way that the program
will be implimented, hopefully complimentary, and allowing you
(sometimes, with the right set of random variables/constraints), to
explore more of those corner cases. You might think of a random
directed tests as generating a spray of test values for your program,
with the constraints controlling the width of the spray, and direction
of the hose.
We can find hard to get at bugs with this technique in HW design,
that's why I'd also like it added to the software arsenal.

> 
> Let me go back to the "graph the number of bugs" thing you mention above.
> If you are working in a world where that concept holds much meaning, 
> you might have to change gears to picture this: with XP and TDD, you
> generally expect to have *no* bugs to graph. 
> 
> Now fast-forward to months later, when you have literally hundreds of
> little tests, each one having driven the development of a few lines of
> code.  You have effectively 100% code coverage.  In fact, you probably
> have tests which overlap, but that's a good thing here.  Now you make
> a tiny mistake, which traditionally would not be noticed until "test 
> time", way down at the end of the project when the QA/verification people 
> get their hands on your code.  Instead of being noticed, perhaps, 
> months later just before shipping, you immediately fail one or two
> tests (or a whole pile) and fix the problem.
Yep, if the traditional way is to have verification/QA right at the
end then I don't advocate that either.
But I DO believe in some quantifiable metric. Some may well state that
'what does 100% statement coverage actually mean?', But it is an easy
metric to compute and very few would advocate that 90% code coverage
by tests is better than 100%.

It could be that your suite of current tests may well cause both
clauses of all if statements to be excercised, but how do you know
what possible combinations of variables used in the if clause have
actually contributed to each choice? Your tests could test
functionality but still you may have a variable in an if statement
that is never exercised so that its changes affect the outcome:
something like 'if a or b==c:' when a is always false for some reason.

Coverage tools give you this kind of metric and allow you to test the
quality of the code produced, not just in terms of 'does it function
correctly when tested', but helps to answer 'how well do the tests
exercise the given code'.
You say in the next statement that the methodology TDD ensures no
extraneous code. I say *measure it*.
> Or, in spite of the fact that you actually *drove the development of
> the code with the tests*, and that therefore there is really no code
> that doesn't need to be there to pass the tests, you manage to let
> a bug get through.  Maybe it was more of an error in interpretation
> of the functional requirements.  In other words, almost certainly one
> of your tests is actually wrong.  Alternatively, the tests are all fine
> but you're in the unfortunate (but fortunately rare when you do it this
> way) position of having an actual, real _bug_ in spite of all those tests.
> 
> What do you do?  Add it to the bug database and see the graph go up?
> No, you don't even *have* a bug database!  There are no bugs to go in it,
> except this one.  What's the best next step?  Write a test!
In your scenario, I would graph test 'checkins' over time, assuming
tests are checked-in to some version control system when they 'work'.
> 
> The new test fails in the presence of the bug, and now you modify the
> code to pass the test (and to keep passing all those other tests) and
> check in the change.  Problem solved.  No bug, no bug database, no graph.
> 
> Maybe this sounds goofy or unrealistic to some who haven't tried it.
> Personally I thought it was novel enough to warrant an experiment when
> I first encountered it, but it didn't take long before I was convinced
> that this approach was fundamentally different and more powerful than
> the previous approaches I'd tried over twenty plus years of coding.
> It may not feel right to some people, but since it's pretty darn easy
> to read up on the approach and experiment for a few hours or days to
> get a feel for it, "don't knock it if you haven't tried it".  :-)
I don't doubt that XP and TDD can be an advance on previous methods,
I'm just wary of the lack of metrics. And I'm unsure of how you
alleviate the problem of only testing how one person or close team
think it should work. In HW design some teams have separate
Verification and Design groups and/or create two implimentations in
different languages and compare their results. If you'r writing the
test and writing the code, you can be blind to common errors.
Is TDD enough?
> 
> To answer the original question of "how do you know when you're done?"
> I would say that TDD itself doesn't really say, but in XP you have
> what are called "acceptance tests", which are similar to the unit tests
> in that they are a fully automated suite of tests that verify the 
> high level functionality of the entire program.  When your code
> passes all the units tests you have written to drive it's development,
> *and* all the acceptance tests, then you're done.  (That's another one
> of the "things of beauty" in XP: the tests aggressively control scope
> since you don't have to build anything for which you don't have a test.)
OK, how do you guage the quality of those acceptance tests + TDD +
generated program?
Graphs of TDD checkins and acceptance test checkins over time, as well
as in-depth coverage metrics might allow you to at least compare one
development effort with the next one and evolve a useful set of
'numbers' that attempt to abstract what a good design project looks
like.

With Xp but without the stats, makes it harder to encapsulate and
disseminate good project management principals.

I'd like to see people describe how to measure SW bug rates and
coverage (statement and branch etc), using specific tools, then say
'don't ship until the bug rate is below X when measured using Y; and
the Coverage metric is S when measured using tool T with options A, B
and C).
A dumb follower of rules might still cock things up, but quantifiable
goals like that would help the non-dumb.
> -Peter

Donald 'Paddy' McCarthy.




More information about the Python-list mailing list