[Twisted-Python] Trial & the mock library

Hey all - I've recently started working with the 'mock' library in our trial tests, and am looking for some best-practice advice. I'm really just starting to get used to the library, so it might well have functionality that I'm unaware of or am misusing. I very quickly ran into a problem where I mistakenly returned a Mock() in the place of a deferred, causing the asserts in callbacks to not be called, and for the test to spuriously pass. A basic example: def test_foo(): d = Mock() def check_result(res): self.assertEqual(res.code, expected) # never called d.addCallback(check_result) return d # Mock is truthy, test passes This occurred where I was mocking some internals of the class under test; something like the below A slightly more believable example: == myclass.py == def some_function(...): d = self.authenticate() d.addCallback(foo) # foo never called d.addErrback(bar) # bar never called return d == test_myclass.py == def setUp(self): self.resource.authenticate = Mock(return_value=Mock()) def test_foo(): d = self.resource.some_function def check_result(res): # never called self.assertEqual(res.code, expected) d.addCallback(check_result) return d # test passes Currently, I'm experimenting with wrapping Mock instantiations by defining common deferred methods on them in advance; this approach would eventually lead to extending Mock itself with this functionality. def nonDeferredMock(): m = Mock() def notimpl(*args, **kwargs): raise NotImplementedError('You treated a Mock like a Deferred!') m.addCallback = notimpl m.addErrback = notimpl m.addBoth = notimpl m.addCallbacks = notimpl return m Another approach might be extending TestCase to check that return values are always not Mock objects. Does anyone on the list have experience with this? Obviously, this only happens when mistakes are made when writing tests, but I'd rather have confidence that when my tests pass, that they've passed for the right reasons. Another antipattern that I've come across has been: resource.mymethod = Mock(return_value=defer.succeed(None)) which works fine for tests in which mymethod() is called once, but always returns the same deferred object if multiple calls are made. What would be a better approach? Cheers- James

In addition to what jml said, I wonder if it makes sense for TestCase to raise when the return value of a test method is something other than None or a Deferred... cheers lvh

On Jul 25, 2013, at 8:40 AM, Jonathan Lange <jml@mumak.net> wrote:
That sounds like a great idea, I wonder if anyone's thought of it before. We already encode the information in epytext. Should we make it a dependency, so it can be parsed at runtime to aid with enforcement? -glyph

On 08:33 pm, glyph@twistedmatrix.com wrote:
Please finish the Lore -> Sphinx transition first so that we can begin investigating whether reStructuredText for API documentation is sensible. We don't want to drag in an epytext parsing dependency if we're just going to switch to docutils in eight or nine years. Jean-Paul

On Jul 25, 2013, at 3:51 PM, exarkun@twistedmatrix.com wrote:
That's a good point, but I wouldn't want to block on it. We could easily implement a simple abstraction layer for type identification that layers and translates between epydoc and ReST; we'd probably need this during the transitional period anyway, since lore->sphinx isn't pydoctor->sphinx. Plus, of course, we'd need that abstraction layer to support multiple different styles of py3k function annotations, if we're talking about things that people might use in eight or nine years. -glyph

I have a few thoughts: First, how does this hypothetical system for specifying return types solve the original problem (that user-written methods on TestCase pass unexpectedly when a non-Deferred is returned)? If I'm the one writing test_whatever, with the proposed doc string method for specifying return type, then wouldn't I need to write a docstring that specifies the return type as Deferred? Second, I don't love the idea of the docstring changing how a function behaves... I like that it's a free form blob of text. I think I'd rather see decorators used for this purpose. I understand that one benefit of putting the info in the docstring is that it ensures the docstring will always be accurate. But you could just as easily extract the return type from the decorator for generating HTML docs, and people reading the source could see the decorators. Also, we can still leverage the existing info recorded in the docstring by doing a one time pass off the code to turn the docstring info into decorators. My two cents On Jul 25, 2013 5:13 PM, "Glyph" <glyph@twistedmatrix.com> wrote:

On Jul 25, 2013, at 6:51 PM, Matt Haggard <haggardii@gmail.com> wrote:
I have a few thoughts:
First, how does this hypothetical system for specifying return types solve the original problem (that user-written methods on TestCase pass unexpectedly when a non-Deferred is returned)? If I'm the one writing test_whatever, with the proposed doc string method for specifying return type, then wouldn't I need to write a docstring that specifies the return type as Deferred?
All of this follows quite obviously if you just consider that a functional specification document (in other words, the generalization of the docstring) is simply a partial bijective homomorphism over the orbifold of possible runtime states (into the Hilbert space with a boundary defined by the NxM coordinate matrix of all possible intersections of sets of bugs and non-bugs of course).
Second, I don't love the idea of the docstring changing how a function behaves... I like that it's a free form blob of text. I think I'd rather see decorators used for this purpose.
It's not so much changing how the function behaves but rather what its behavior is. In any case, there's a great paper which offers an excellent theoretical background that should explain what I mean by that distinction, and should really elucidate some of the other threads in this conversation as well: <http://isotropic.org/papers/chicken.pdf> Hope that clears things up, -glyph

On Fri, Jul 26, 2013 at 12:12 AM, Glyph <glyph@twistedmatrix.com> wrote:
No need. If we implement it with Zope 3 components (I think it's called pyramid or grok now or something?), then we can rely on their py3k support. I am somewhat troubled by the security implications of this feature, especially when used in conjunction with manhole. jml

That sounds like a great idea, I wonder if anyone's thought of it before.
I suggested this in #twisted a few years back and was immediately told it was a bad idea (names withheld!). Another case in which this pops up is if you accidentally yield some deferreds in a test but don't decorate with @inlineCallbacks. Trial runs the test function, gets back a generator object, and thinks "yep, ok, that's not an error". I've hit that multiple times and know several experienced Twisted users who have too. T On Thu, Jul 25, 2013 at 9:33 PM, Glyph <glyph@twistedmatrix.com> wrote:

On 25 Jul, 02:25 pm, jamesbroadhead@gmail.com wrote:
To address this problem, I suggest you get into the habit of watching your unit tests fail in the expected way before you make the necessary implementation changes to make them pass. This is only one of an unlimited number of ways your unit tests can be buggy. It might be tempting to try to fix the test runner to prevent you from ever falling into this trap again - and who knows, it might even be a good idea. However, if you run your tests and see them fail in the way you expected them to fail before you write the code that makes them pass, then you will be sure to avoid the many, many, many *other* pitfalls that have nothing to do with accidentally returning the wrong object. This is just one of the attractions of test-driven development for me. Jean-Paul

On Jul 26, 2013, at 7:12 AM, exarkun@twistedmatrix.com wrote:
On a more serious note than our previous digression, perhaps *this* is the thing we should be modifying Trial to support. The vast majority of Twisted committers do development this way - or at least aspire to, most of the time - but to someone new to automated testing, it's not entirely clear how you're supposed to use something like Trial, or how important it is that you see the tests fail first. Perhaps if trial had a bit more of a memory of things that happened between test runs it would be useful. For example, a mode where you could tell it what you're working on, and you could just re-run the same thing and you'd only get a 'success' when you went back and forth between red and green. Here's a silly little narrative about how one might use such a thing: $ tribulation begin myproject Beginning a time of turmoil for python package 'myproject', in './myproject/'. myproject.test_1 Case1 test_1 ... [OK] ------------------------------------------------------------------------------- Ran 2 tests in 0.033s PROCEED (successes=1) - All tests passing, an auspicious beginning. Now write a failing test. $ tribulation continue myproject.test_1 Case1 test_1 ... [OK] myproject.test_2 Case2 test_2 ... [OK] ------------------------------------------------------------------------------- Ran 2 tests in 0.033s AGAIN (successes=2) - a test should have failed. # oops, 'test_2' was just 'pass'... let me fix that $ tribulation continue $ tribulation begin myproject Beginning a time of turmoil for python package 'myproject', in './myproject/'. myproject.test_1 Case1 test_1 ... [OK] myproject.test_2 Case2 test_2 ... [FAIL] ------------------------------------------------------------------------------- Ran 2 tests in 0.450s PROCEED (successes=2) - we are working on myproject.Case2.test_2 now. $ tribulation continue myproject.test_2 Case2 test_2 ... [FAIL] ------------------------------------------------------------------------------- Ran 1 tests in 0.020s AGAIN (successes=2) - you should have made the test pass. $ tribulation continue myproject.test_2 Case2 test_2 ... [OK] ------------------------------------------------------------------------------- Ran 1 tests in 0.01s PROCEED (successes=1) - myproject.Case2.test_2 works now, let's make sure nothing else broke. $ tribulation continue myproject.test_1 Case1 test_1 ... [OK] myproject.test_2 Case2 test_2 ... [OK] ------------------------------------------------------------------------------- Ran 2 tests in 0.033s PROCEED (successes=2) - no regressions, find the next thing to work on $ tribulation conclude You have received one billion points, congratulations you have defeated software. Does this seem like it might be a useful feature for someone to work on? Not shown here is the part that when you do introduce a regression, it runs just the tests that failed until you fix all of them, then goes back up the suite until it reaches the top and you move on to the next thing... -glyph

On Sat, Jul 27, 2013 at 5:26 AM, Glyph <glyph@twistedmatrix.com> wrote:
I like the idea. testrepository <http://testrepository.readthedocs.org/en/latest/> stores tests results in a database and already has support for running just the tests that failed, etc. It just runs whatever executable you give it, provided that executable outputs subunit. Not on a computer with it installed (and can't find a good webpage), but here's the basic gist: $ testr init # initialize the database $ testr run # runs everything ... # test output goes here $ testr run # runs everything, shows delta of tests failed & time taken $ testr run --failing # just run what failed It doesn't have the TDD straightjacket you describe, but that would be a fun thing to make. jml

In addition to what jml said, I wonder if it makes sense for TestCase to raise when the return value of a test method is something other than None or a Deferred... cheers lvh

On Jul 25, 2013, at 8:40 AM, Jonathan Lange <jml@mumak.net> wrote:
That sounds like a great idea, I wonder if anyone's thought of it before. We already encode the information in epytext. Should we make it a dependency, so it can be parsed at runtime to aid with enforcement? -glyph

On 08:33 pm, glyph@twistedmatrix.com wrote:
Please finish the Lore -> Sphinx transition first so that we can begin investigating whether reStructuredText for API documentation is sensible. We don't want to drag in an epytext parsing dependency if we're just going to switch to docutils in eight or nine years. Jean-Paul

On Jul 25, 2013, at 3:51 PM, exarkun@twistedmatrix.com wrote:
That's a good point, but I wouldn't want to block on it. We could easily implement a simple abstraction layer for type identification that layers and translates between epydoc and ReST; we'd probably need this during the transitional period anyway, since lore->sphinx isn't pydoctor->sphinx. Plus, of course, we'd need that abstraction layer to support multiple different styles of py3k function annotations, if we're talking about things that people might use in eight or nine years. -glyph

I have a few thoughts: First, how does this hypothetical system for specifying return types solve the original problem (that user-written methods on TestCase pass unexpectedly when a non-Deferred is returned)? If I'm the one writing test_whatever, with the proposed doc string method for specifying return type, then wouldn't I need to write a docstring that specifies the return type as Deferred? Second, I don't love the idea of the docstring changing how a function behaves... I like that it's a free form blob of text. I think I'd rather see decorators used for this purpose. I understand that one benefit of putting the info in the docstring is that it ensures the docstring will always be accurate. But you could just as easily extract the return type from the decorator for generating HTML docs, and people reading the source could see the decorators. Also, we can still leverage the existing info recorded in the docstring by doing a one time pass off the code to turn the docstring info into decorators. My two cents On Jul 25, 2013 5:13 PM, "Glyph" <glyph@twistedmatrix.com> wrote:

On Jul 25, 2013, at 6:51 PM, Matt Haggard <haggardii@gmail.com> wrote:
I have a few thoughts:
First, how does this hypothetical system for specifying return types solve the original problem (that user-written methods on TestCase pass unexpectedly when a non-Deferred is returned)? If I'm the one writing test_whatever, with the proposed doc string method for specifying return type, then wouldn't I need to write a docstring that specifies the return type as Deferred?
All of this follows quite obviously if you just consider that a functional specification document (in other words, the generalization of the docstring) is simply a partial bijective homomorphism over the orbifold of possible runtime states (into the Hilbert space with a boundary defined by the NxM coordinate matrix of all possible intersections of sets of bugs and non-bugs of course).
Second, I don't love the idea of the docstring changing how a function behaves... I like that it's a free form blob of text. I think I'd rather see decorators used for this purpose.
It's not so much changing how the function behaves but rather what its behavior is. In any case, there's a great paper which offers an excellent theoretical background that should explain what I mean by that distinction, and should really elucidate some of the other threads in this conversation as well: <http://isotropic.org/papers/chicken.pdf> Hope that clears things up, -glyph

On Fri, Jul 26, 2013 at 12:12 AM, Glyph <glyph@twistedmatrix.com> wrote:
No need. If we implement it with Zope 3 components (I think it's called pyramid or grok now or something?), then we can rely on their py3k support. I am somewhat troubled by the security implications of this feature, especially when used in conjunction with manhole. jml

That sounds like a great idea, I wonder if anyone's thought of it before.
I suggested this in #twisted a few years back and was immediately told it was a bad idea (names withheld!). Another case in which this pops up is if you accidentally yield some deferreds in a test but don't decorate with @inlineCallbacks. Trial runs the test function, gets back a generator object, and thinks "yep, ok, that's not an error". I've hit that multiple times and know several experienced Twisted users who have too. T On Thu, Jul 25, 2013 at 9:33 PM, Glyph <glyph@twistedmatrix.com> wrote:

On 25 Jul, 02:25 pm, jamesbroadhead@gmail.com wrote:
To address this problem, I suggest you get into the habit of watching your unit tests fail in the expected way before you make the necessary implementation changes to make them pass. This is only one of an unlimited number of ways your unit tests can be buggy. It might be tempting to try to fix the test runner to prevent you from ever falling into this trap again - and who knows, it might even be a good idea. However, if you run your tests and see them fail in the way you expected them to fail before you write the code that makes them pass, then you will be sure to avoid the many, many, many *other* pitfalls that have nothing to do with accidentally returning the wrong object. This is just one of the attractions of test-driven development for me. Jean-Paul

On Jul 26, 2013, at 7:12 AM, exarkun@twistedmatrix.com wrote:
On a more serious note than our previous digression, perhaps *this* is the thing we should be modifying Trial to support. The vast majority of Twisted committers do development this way - or at least aspire to, most of the time - but to someone new to automated testing, it's not entirely clear how you're supposed to use something like Trial, or how important it is that you see the tests fail first. Perhaps if trial had a bit more of a memory of things that happened between test runs it would be useful. For example, a mode where you could tell it what you're working on, and you could just re-run the same thing and you'd only get a 'success' when you went back and forth between red and green. Here's a silly little narrative about how one might use such a thing: $ tribulation begin myproject Beginning a time of turmoil for python package 'myproject', in './myproject/'. myproject.test_1 Case1 test_1 ... [OK] ------------------------------------------------------------------------------- Ran 2 tests in 0.033s PROCEED (successes=1) - All tests passing, an auspicious beginning. Now write a failing test. $ tribulation continue myproject.test_1 Case1 test_1 ... [OK] myproject.test_2 Case2 test_2 ... [OK] ------------------------------------------------------------------------------- Ran 2 tests in 0.033s AGAIN (successes=2) - a test should have failed. # oops, 'test_2' was just 'pass'... let me fix that $ tribulation continue $ tribulation begin myproject Beginning a time of turmoil for python package 'myproject', in './myproject/'. myproject.test_1 Case1 test_1 ... [OK] myproject.test_2 Case2 test_2 ... [FAIL] ------------------------------------------------------------------------------- Ran 2 tests in 0.450s PROCEED (successes=2) - we are working on myproject.Case2.test_2 now. $ tribulation continue myproject.test_2 Case2 test_2 ... [FAIL] ------------------------------------------------------------------------------- Ran 1 tests in 0.020s AGAIN (successes=2) - you should have made the test pass. $ tribulation continue myproject.test_2 Case2 test_2 ... [OK] ------------------------------------------------------------------------------- Ran 1 tests in 0.01s PROCEED (successes=1) - myproject.Case2.test_2 works now, let's make sure nothing else broke. $ tribulation continue myproject.test_1 Case1 test_1 ... [OK] myproject.test_2 Case2 test_2 ... [OK] ------------------------------------------------------------------------------- Ran 2 tests in 0.033s PROCEED (successes=2) - no regressions, find the next thing to work on $ tribulation conclude You have received one billion points, congratulations you have defeated software. Does this seem like it might be a useful feature for someone to work on? Not shown here is the part that when you do introduce a regression, it runs just the tests that failed until you fix all of them, then goes back up the suite until it reaches the top and you move on to the next thing... -glyph

On Sat, Jul 27, 2013 at 5:26 AM, Glyph <glyph@twistedmatrix.com> wrote:
I like the idea. testrepository <http://testrepository.readthedocs.org/en/latest/> stores tests results in a database and already has support for running just the tests that failed, etc. It just runs whatever executable you give it, provided that executable outputs subunit. Not on a computer with it installed (and can't find a good webpage), but here's the basic gist: $ testr init # initialize the database $ testr run # runs everything ... # test output goes here $ testr run # runs everything, shows delta of tests failed & time taken $ testr run --failing # just run what failed It doesn't have the TDD straightjacket you describe, but that would be a fun thing to make. jml
participants (7)
-
exarkun@twistedmatrix.com
-
Glyph
-
James Broadhead
-
Jonathan Lange
-
Laurens Van Houtven
-
Matt Haggard
-
Terry Jones