[Python-Dev] setUpClass and setUpModule in unittest

Sat Feb 13 18:46:26 CET 2010

On Fri, Feb 12, 2010 at 8:01 PM, Glyph Lefkowitz
<glyph at twistedmatrix.com> wrote:
> On Feb 11, 2010, at 1:11 PM, Guido van Rossum wrote:
>
>> I have skimmed this thread (hence this reply to the first rather than
>> the last message), but in general I am baffled by the hostility of
>> testing framework developers towards their users. The arguments
>> against class- and module-level seUp/tearDown functions seems to be
>> inspired by religion or ideology more than by the zen of Python. What
>> happened to Practicality Beats Purity?
>
> My sentiments tend to echo Jean-Paul Calderone's in this regard, but I think what he's saying bears a lot of repeating.  We really screwed up this feature in Twisted and I'd like to make sure that the stdlib doesn't repeat the mistake.  (Granted, we screwed it up extra bad <http://twistedmatrix.com/trac/ticket/2303>, but I do think many of the problems we encountered are inherent.)

Especially since you screwed up extra bad, the danger exists that
you're overreacting.

> The issue is not that we test-framework developers don't like our users, or want to protect them from themselves.  It is that our users - ourselves chief among them - desire features like "I want my tests to be transparently optimized across N cores and N disks".

Yeah, users ask for impossible features all the time. ;-)

Seriously, we do this at Google on a massive scale, for many languages
including Python. It's works well but takes getting used to: while
time is saved waiting for tests, some time is wasted debugging tests
that run fine on the developer's workstation but not in the test
cluster. We've developed quite a few practices around this, which
include ways to override and control the test distribution, as well as
reports showing the historical "flakiness" for each tests.

> I can understand how resistance to setUp/tearDown*Class/Module comes across as user-hostility, but I can assure you this is not the case.  It's subtle and difficult to explain how incompatible with these advanced features the *apparently* straightforward semantics of setting up and tearing down classes and modules.  Most questions of semantics can be resolved with a simple decision, and it's not clear how that would interfere with other features.
>
> In Twisted's implementation of setUpClass and tearDownClass, everything seemed like it worked right up until the point where it didn't.  The test writer thinks that they're writing "simple" setUpClass and tearDownClass methods to optimize things, except almost by definition a setUpClass method needs to manipulate global state, shared across tests.  Which means that said state starts getting confused when it is set up and torn down concurrently across multiple processes.  These methods seem simple, but do they touch the filesystem?  Do they touch a shared database, even a little?  How do they determine a unique location to do that?  Without generally available tools to allow test writers to mess with the order and execution environment of their tests, one tends to write tests that rely on these implementation and ordering accidents, which means that when such a tool does arrive, things start breaking in unpredictable ways.

Been there, done that. The guideline should be that setUpClass and
friends save time but should still isolate themselves from other
copies that might run concurrently. E.g. if you have to copy a ton of
stuff into the filesystem, you should still put it in a temp dir with
a randomized name, and store that name as a class variable.

When there's a global resource (such as a database) that really can't
be shared, well, you have to come up with a way to lock it -- that's
probably necessary even if your tests ran completely serialized,
unless there's only one developer and she never multitasks. :-)

>> The argument that a unittest framework shouldn't be "abused" for
>> regression tests (or integration tests, or whatever) is also bizarre
>> to my mind. Surely if a testing framework applies to multiple kinds of
>> testing that's a good thing, not something to be frowned upon?
>
> For what it's worth, I am a big fan of abusing test frameworks in generally, and pyunit specifically, to perform every possible kind of testing.  In fact, I find setUpClass more hostile to *other* kinds of testing, because this convenience for simple integration tests makes more involved, performance-intensive integration tests harder to write and manage.

That sounds odd, as if the presence of this convenience would prohibit
you from also implement other features.

>> On the other hand, I think we should be careful to extend unittest in
>> a consistent way. I shuddered at earlier proposals (on python-ideas)
>> to name the new functions (variations of) set_up and tear_down "to
>> conform with PEP 8" (this would actually have violated that PEP, which
>> explicitly prefers local consistency over global consistency).
>
> This is a very important point.  But, it's important not only to extend unittest itself in a consistent way, but to clearly describe the points of extensibility so that third-party things can continue to extend unittest themselves, and cooperate with each other using some defined protocol so that you can combine those tools.

Yeah, and I suspect that the original pyunit (now unittest) wasn't
always clear on this point.

> I tried to write about this problem a while ago <http://glyf.livejournal.com/72505.html> - the current extensibility API (which is mostly just composing "run()") is sub-optimal in many ways, but it's important not to break it.

I expect that *eventually* something will come along that is so much
better than unittest that, once matured, we'll want it in the stdlib.
(Or, alternatively, eventually stdlib inclusion won't be such a big
deal any more since distros mix and match. But then inclusion in a
distro would become every package developer's goal -- and then the
circle would be round, since distros hardly move faster than Python
releases...)

But in the mean time I believe evolving unittest is the right thing to
do. Adding new methods is relatively easy. Adding whole new paradigms
(like testresources) is a lot harder, eventually in the light of the
latter's relative immaturity.

> And setUpClass does inevitably start to break those integration points down, because it implies certain things, like the fact that classes and modules are suites, or are otherwise grouped together in test ordering.

I expect that is what the majority of unittest users already believe.

> This makes it difficult to create custom suites, to do custom ordering, custom per-test behavior (like metrics collection before and after run(), or gc.collect() after each test, or looking for newly-opened-but-not-cleaned-up external resources like file descriptors after each tearDown).

True, the list never ends.

> Again: these are all concrete features that *users* of test frameworks want, not just idle architectural fantasy of us framework hackers.

I expect that most bleeding edge users will end up writing a custom
framework, or at least work with a bleeding edge framework that change
change rapidly to meet their urgent needs.

> I haven't had the opportunity to read the entire thread, so I don't know if this discussion has come to fruition, but I can see that some attention has been paid to these difficulties.  I have no problem with setUpClass or tearDownClass hooks *per se*, as long as they can be implemented in a way which explicitly preserves extensibility.

That's good to know. I have no doubt they (and setUpModule c.s.) can
be done in a clean, extensible way. And that doesn't mean we couldn't
also add other features -- after all, not all users have the same
needs. (If you read the Zen of Python, you'll see that TOOWTDI has
several qualifications. :-)

>> Regarding the objection that setUp/tearDown for classes would run into
>> issues with subclassing, I propose to let the standard semantics of
>> subclasses do their job. Thus a subclass that overrides setUpClass or
>> tearDownClass is responsible for calling the base class's setUpClass
>> and tearDownClass (and the TestCase base class should provide empty
>> versions of both). The testrunner should only call setUpClass and
>> tearDownClass for classes that have at least one test that is
>> selected.
>>
>> Yes, this would mean that if a base class has a test method and a
>> setUpClass (and tearDownClass) method and a subclass also has a test
>> method and overrides setUpClass (and/or tearDown), the base class's
>> setUpClass and tearDown may be called twice. What's the big deal? If
>> setUpClass and tearDownClass are written properly they should support
>> this.
>
> Just to be clear: by "written properly" you mean, written as classmethods, storing their data only on 'cls', right?

Yes. And avoiding referencing unique global resources (both within and
outside the current process).

>> If this behavior is undesired in a particular case, maybe what
>> was really meant were module-level setUp and tearDown, or the class
>> structure should be rearranged.
>
> There's also a bit of an open question here for me: if subclassing is allowed, and module-level setup and teardown are allowed, then what if I define a test class with test methods in module 'a', as well as module setup and teardown, then subclass it in 'b' which *doesn't* have setup and teardown... is the subclass in 'b' always assumed to depend on the module-level setup in 'a'?

You shouldn't be doing that kind of thing, but for definiteness, the
answer is "no". If you use class setup/teardown instead you can
control this via inheritance. At the module level, if you really want
to do this, b's module setup would have to explicitly call a's module
setup.

> Is there a way that it could be made not to if it weren't necessary? What if it stubs out all of its test methods?  In the case of classes you've got the 'cls' variable to describe the dependency and the shared state, but in the case of modules, inheritance doesn't create an additional module object to hold on to.

It should be "no" so that you can explicitly code up "yes" if you want
to. The other way around would be much messier, as you describe.

> testresources very neatly sidesteps this problem by just providing an API to say "this test case depends on that test resource", without relying on the grouping of tests within classes, modules, or packages.  Of course you can just define a class-level or module-level resource and then have all your tests depend on it, which gives you the behavior of setUpClass and setUpModule in a more general way.

I wish it was always a matter of "resources". I've seen use cases for
module-level setup that were much messier than that (e.g. fixing
import paths). I expect it will be a while before the testresources
design has been shaken out sufficiently for it to be included in the
stdlib.

-- 
--Guido van Rossum (python.org/~guido)