Case study: debugging failed assertRaises bug

Tue Apr 26 02:05:10 EDT 2011

I've just spent two hours banging my head against what I *thought* 
(wrongly!) was a spooky action-at-a-distance bug in unittest, so I 
thought I'd share it with anyone reading.

I have a unit test which I inherit from a mixin class that looks 
something like this:

    def testBadArgType(self):
        # Test failures with bad argument types.
        for d in (None, 23, object(), "spam"):
            self.assertRaises(TypeError, self.func, d)

Seems pretty simple, right? But it was mysteriously failing with one of 
my functions. The first problem is that assertRaises doesn't take a 
custom error message when it fails, so all I was seeing was this:

Traceback (most recent call last):
  File "src/test_stats_extras.py", line 1122, in testBadArgType
    self.assertRaises(TypeError, self.func, d)
AssertionError: TypeError not raised by <lambda>

Don't be put off by the lambda. That's just a thin wrapper around the 
function I actually am testing: the mixin test class assumes the test 
function takes a single argument, but my function actually requires two. 
So I use a lambda as a wrapper to provide the second argument.

Since I'm pretty sure that my function really does fail with TypeError, I 
add a line to fail *hard*. Immediately before the assertRaises, I add an 
extra call like this: 

_ = self.func(d)

Sure enough, it fails on the direct call to self.func, but not on the 
assertRaises. Now I'm getting seriously weirded out: if I call the test 
function (or its wrapper) directly, I get the expected TypeError, but if 
I call it via assertRaises, it doesn't raise.

So I bang away on the test for a while, replace the assertRaises with an 
assertEqual, and sure enough the assertEqual fails with an exception. At 
this point I'm *convinced* it's some weird bug in unittest, and I'm 
trying to come up with a short example so I can raise a bug report. But I 
can't replicate the bug.

So I ended up putting this in my test code:

            # This next line now PASSES (i.e. exception is raised),
            # but ONLY if it is followed by another (failing) test!
            self.assertRaises(Exception, self.func, d)
            self.assertEqual(self.func(d), 100)

and was nearly ready to throw in the towel and just flag the test as 
"skipped" and deal with it later, when I thought of the good old 
fashioned "add some prints inside your code". That's a bit trickier for 
unittest, because it prints its output to stderr rather than stdout, but 
in Python 3 that's easy to deal with. So I ended up with:

            func = self.func
            print("expecting TypeError ", end='', file=sys.stderr)
            try:
                _ = func(d)
            except TypeError:
                # I expect TypeError.
                print("and got it ", end='', file=sys.stderr)
            else:
                print("but got %r " % _, end='', file=sys.stderr)
            # This FAILS (i.e. no exception is raised).
            self.assertRaises(TypeError, func, d)

and unit test prints:

......................... expecting TypeError and got it expecting 
TypeError and got it expecting TypeError and got it expecting TypeError 
but got 'm' F......................................................
======================================================================
FAIL: testBadArgType (__main__.QuantileBehaviourTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "src/test_stats_extras.py", line 1122, in testBadArgType
    self.assertRaises(TypeError, func, d)
AssertionError: TypeError not raised by <lambda>

/facepalm/

That explains everything, even the spooky action-at-a-distance stuff. It 
is a bug in my own code, not unittest, not the mixin, not the lambda, but 
my function should be failing when called with a string but doesn't.

So, lessons learned...

(1) assertRaises REALLY needs a better error message. If not a custom 
message, at least it should show the result it got instead of an 
exception.

(2) Never underestimate the power of print. Sticking a print statement in 
the middle of problematic code is a powerful debugging technique.

(3) If you think you've found a bug in Python, it almost certainly is a 
bug in your own code. I knew that already, but now I have the scars to 
prove it.

(4) Unit tests should test one thing, not four. I thought my loop over 
four different "bad input" arguments was one conceptual test, and that 
ended up biting me. If the failing test was "testBadArgTypeStr" I would 
have realised what was going on much faster. 

(5) There's only so many tiny little tests you can write before your head 
explodes, so I'm going to keep putting "for arg in spam" loops in my 
tests. But now I know to unroll those suckers into individual tests on 
the first sign of trouble.

-- 
Steven