[Tutor] writing effective unittests

Fri Jan 4 00:41:20 CET 2013

Hi Luke,

My responses inline below.

On 04/01/13 06:46, Luke Thomas Mergner wrote:

> I am on the digest version of the list, so I haven't gotten a copy of
>any replies.

All the more reason not to be on the digest version. But you should have
received replies *in the digest*.

(By the way, thank you for editing the subject line to something
meaningful.)

> What am I missing? The biggest problem is that no one is explaining
>the rationale behind testing.

You test software so that you know it does what you expect. Until you
actually try it, how do you know it will work as expected?

I'm reminded of a famous quote from Donald Knuth, which if memory serves
goes something like this:

"Beware of bugs in the above code, I have only proven it is correct, not
tested it."

All testing is aimed at making sure the software does what you want it
to do and does not contain bugs.

> The trivial examples compare integers: 2 == 2. At first glance this
>seems pointless.

As given, that is pointless, since there is no possible way for 2 to
fail to be 2. A less pointless example would be:

x = some_calculation(a, b, c)  # expected to return 2
assertEqual(x, 2)  # x might not be 2, so we test that it is

> I had assumed that tests would attempt to confuse my functions and
>teach me how to write more robust code.

That is part of it. A solid test suite should cover unexpected input
as well as expected input, to ensure that you code does the right
thing when faced with bad input.

In my earlier post, which I hope you received, I gave an toy example
of testing a function that takes at most one integer argument and
returns a string. Some of the tests I wrote test that the function
*correctly (and safely) fails* with TypeError if you pass more than
one argument, or if the argument is not an int.

>But I *think* now that tests are a way to determine if new code has
> changed old behaviors. Testing 2 == 2 is trivial, but if the
> function starts returning 3 in a few months, it would be helpful to
>know right away.

Yes, exactly!

We can write different sorts of tests:

* Doc tests are (mostly) intended to act as documentation. The problem
   with documentation is that it can get out of date as the software
   changes. How do you know when the documentation is out of date?
   Because the doc tests fail!

* Regression tests are intended to warn you if a bug has come back.
   Every time you fix a reported bug, you should have a regression test
   for that bug, so that if any future change accidentally *regresses*
   the code and reintroduces that bug, you will find out immediately.

* Unit tests are intended for testing that code behaves as expected,
   that parts of the public interface that you expect exist actually
   does exist. There's no point documenting that you have a class Spam
   if the class doesn't actually exist, so you have a unit test that
   confirms that Spam exists, and that it behaves as you expect.

(Note that the concept of "unit tests" is more general than the things
you write with the unittest module. The unittest module is a framework
for writing unit tests, but there are other frameworks, like nose.)

* Blackbox testing is when you treat functions and methods as black
   boxes. Since you can't see inside them, you can only test things
   which are public knowledge. E.g. "if I pass this input to the
   function, I should get this output".

* Whitebox testing is the opposite: you test based on your knowledge
   of the function's internal details. E.g. "if I pass this input to
   the function, it will run the code path A->B->C; but if I pass this
   other input, it will run the code path A->D->E->C. Therefore I need
   to test both inputs."

   The problem with whitebox testing is that if you change the
   function's implementation, your tests may become obsolete. But the
   advantage is that the tests cover more of the possible things that
   could happen.

* Fuzz testing is when you check how robust your code is by actually
   making *random changes to the data*, then seeing how your code
   responds or fails. One early example was "The Monkey" on Apple
   Macs in 1983, which fed random mouse clicks and key presses to
   discover bugs in MacPaint.

There are many other forms of testing that are possible:

https://en.wikipedia.org/wiki/Software_testing

But in general, any testing is better than no testing.

My further responses to follow later in the day.

-- 
Steven