Hardware take on software testing.

Peter Hansen peter at engcorp.com
Mon Jun 9 22:49:52 CEST 2003

Michele Simionato wrote:
> Nevertheless, I think you are a little bit too enthusiastic.

Perhaps. We'd have to define how much is "too much", however, before
that could be debated.  :-)

> I am referring to statements like "with XP and TDD, you generally
> expect to have *no* bugs" and "you have effectively 100% code coverage".

We find we achieve 100% code coverage on code that is purely test-driven,
and we _do_ expect no bugs.  We find them from time to time, of course, 
but actually expecting them cause us unnecessary stress and waste our time.
If we expected lots of bugs, we'd probably waste a large portion of our 
time planning formal code reviews, trying to do tests with randomized inputs, 
and other such things.  Those might be "nice" to have, in theory, but with 
so few bugs *actually found* we don't believe the benefits would outweigh
the costs, and so we continue to "expect" no bugs, even though from time 
to time we find them.  Make any sense?  I should have said "you should 
expect no bugs", and didn't mean to imply you will never have them.

> My concern is that it is effectively *impossible* to write a comprehensive
> sets of tests. 

I believe that's as seen from a traditional testing point of view.  We
don't write sets of tests:  we write one test, then we write the code
necessary to pass that test, then we clean things up and carry on.  
Nothing more.  If we attempted to write a comprehensive set of tests 
for a bunch of code we'd already written, we'd quickly go mad... 
*and* we'd find all kinds of areas where our tests were not adequate 
or where we had less than 100% coverage.

The key difference is in the nature of the code developed with the TDD
approach versus what we pump out without it.  With the latter approach,
you *need* a comprehensive set of tests to be confident it works right.
And as you say, that's an impossibility.  With TDD, you only write
what you've already tested, and you constantly refactor to remove 
duplication, and the resulting code often bears little resemblance to
code that is just hacked out the typical way.  I still consider it
magic, personally, but the code I write with TDD rarely has bugs, so
I can only point to what works, even if I can't explain the theory.
(Kent Beck's a freakin' genius is one possible theory, I suppose.)

> You cannot imagine and tests all the possible usages of
> a piece of code you write, unless it is trivial. You cannot imagine and
> tests all the possible abuse your future user could perform. You cannot
> never be 100% sure of the correctness of your programs in all situations.

All true.  But with TDD, you can be quite sure that your code passes
all the tests you've written, and as long as you don't find yourself 
writing code that shouldn't be there, adding functionality that you 
don't need yet because you haven't written a test for it, you won't
often find yourself bothered by the probability that there is, in 
fact, a bug somewhere in there.  I wouldn't suggest this level of
unconcern for pacemaker software, I assure you, but we're not working
in an area that requires quite that much rigor.

> If it was so, Python (which is developed using LOTS of tests) would be
> bug free. 

I'd say Python has had lots of tests developed _for_ it, but by no means 
was it written test-first.  I don't it's fair to compare the current
codebase against a purely imaginary one that I suppose would have 
resulted had TDD been used to develop Python.

> This is not the case, because of the subtle bugs you cannot prevent
> at writing time and that are discovered after months or years of usage. Unit
> testing is extremely effective againsts simple bugs (similarly to the compiler
> step in a compiled language, but much more effective), but it is powerless
> (actually in some case it helps, but I am considering the really hard bugs
> here) to prevent subtle bugs, exactly because they are subtle, i.e. you
> haven't thought of a test able to discover them when you first wrote the
> program. Of course, once the subtle bug has been discovered (maybe by your
> users), tests helps a LOT in tracking it and fixing it.

I definitely won't argue against the fact of subtle bugs that are 
extremely hard to catch with any testing done in advance.  It's just
not a good idea to write code that is prone to such problems, however,
so I'd encourage a different approach to writing code, which does
not lead to such problems very often.

> Let me take a concrete example, i.e. my recipe to solve the metaclass
> conflict I posted recently. Here the script is less than 20 line long.
> However, I cannot simply write a comprehensive test suite, since there
> are TONS of possibilities and it is extremely difficult for me to imagine
> all the use cases. I can write a big test suite of hundreds of lines,
> but still cannot be completely sure. I cannot spend weeks in writing the
> test suite for 20 lines of code!

But those 20 lines were not written test-first.  If you were to imagine
only *one* specific use case, the most important one for your own
purposes, and write a single test that exposes the most obvious and
easiest aspect of that one use case, and then implemented just enough
code to pass that one single test, how certain would you be that the
code had lots of bugs?  If you repeated that step over and over again,
constantly retesting, refactoring, and only adding code that already 
had a test for it, do you think you'd be quite so unsure about it?

Now I grant that if someone comes along and uses that 20 lines, no 
matter how many tests you've written for it, in a way you haven't
envisioned (and which is therefore not covered by your tested use
cases), then you might start to sweat, and even _expect_ bugs.  I'm
not sure you should, because the code is probably extremely well-
designed and robust at that point, but it's possible the new type
of use will expose an edge case you hadn't quite noticed or something.
So you write another test to catch it, refactor the code again, and
go back to sleep.

> These are really exaggerations in my view. But maybe it is good to
> exaggerate a bit if this serves to convert more and more people ;)

That much is true, of course.  For those who don't know this approach,
I'll happily make it explicit: making sweeping claims and polarizing
statements is a hallmark of XP proponents, but we assume the audience
is adult enough to take such claims with a grain of salt and
investigate the merits of the claims for themselves.  The extreme
nature of the claims is no doubt intended (whether intentionally or
not) to encourage just that kind of skeptical investigation and
the resulting debate, to test the strength of the process.  If
I just went around saying "yeah, TDD is nice, but it won't solve
_all_ your problems", perhaps nobody would pay attention, and that
would, for them, be a shame I believe.  :-)


More information about the Python-list mailing list