[Python-Dev] setUpClass and setUpModule in unittest

Glyph Lefkowitz glyph at twistedmatrix.com
Sat Feb 13 05:01:41 CET 2010

On Feb 11, 2010, at 1:11 PM, Guido van Rossum wrote:

> I have skimmed this thread (hence this reply to the first rather than
> the last message), but in general I am baffled by the hostility of
> testing framework developers towards their users. The arguments
> against class- and module-level seUp/tearDown functions seems to be
> inspired by religion or ideology more than by the zen of Python. What
> happened to Practicality Beats Purity?

My sentiments tend to echo Jean-Paul Calderone's in this regard, but I think what he's saying bears a lot of repeating.  We really screwed up this feature in Twisted and I'd like to make sure that the stdlib doesn't repeat the mistake.  (Granted, we screwed it up extra bad <http://twistedmatrix.com/trac/ticket/2303>, but I do think many of the problems we encountered are inherent.)

The issue is not that we test-framework developers don't like our users, or want to protect them from themselves.  It is that our users - ourselves chief among them - desire features like "I want my tests to be transparently optimized across N cores and N disks".

I can understand how resistance to setUp/tearDown*Class/Module comes across as user-hostility, but I can assure you this is not the case.  It's subtle and difficult to explain how incompatible with these advanced features the *apparently* straightforward semantics of setting up and tearing down classes and modules.  Most questions of semantics can be resolved with a simple decision, and it's not clear how that would interfere with other features.

In Twisted's implementation of setUpClass and tearDownClass, everything seemed like it worked right up until the point where it didn't.  The test writer thinks that they're writing "simple" setUpClass and tearDownClass methods to optimize things, except almost by definition a setUpClass method needs to manipulate global state, shared across tests.  Which means that said state starts getting confused when it is set up and torn down concurrently across multiple processes.  These methods seem simple, but do they touch the filesystem?  Do they touch a shared database, even a little?  How do they determine a unique location to do that?  Without generally available tools to allow test writers to mess with the order and execution environment of their tests, one tends to write tests that rely on these implementation and ordering accidents, which means that when such a tool does arrive, things start breaking in unpredictable ways.

> The argument that a unittest framework shouldn't be "abused" for
> regression tests (or integration tests, or whatever) is also bizarre
> to my mind. Surely if a testing framework applies to multiple kinds of
> testing that's a good thing, not something to be frowned upon?

For what it's worth, I am a big fan of abusing test frameworks in generally, and pyunit specifically, to perform every possible kind of testing.  In fact, I find setUpClass more hostile to *other* kinds of testing, because this convenience for simple integration tests makes more involved, performance-intensive integration tests harder to write and manage.

> On the other hand, I think we should be careful to extend unittest in
> a consistent way. I shuddered at earlier proposals (on python-ideas)
> to name the new functions (variations of) set_up and tear_down "to
> conform with PEP 8" (this would actually have violated that PEP, which
> explicitly prefers local consistency over global consistency).

This is a very important point.  But, it's important not only to extend unittest itself in a consistent way, but to clearly describe the points of extensibility so that third-party things can continue to extend unittest themselves, and cooperate with each other using some defined protocol so that you can combine those tools.

I tried to write about this problem a while ago <http://glyf.livejournal.com/72505.html> - the current extensibility API (which is mostly just composing "run()") is sub-optimal in many ways, but it's important not to break it.

And setUpClass does inevitably start to break those integration points down, because it implies certain things, like the fact that classes and modules are suites, or are otherwise grouped together in test ordering.  This makes it difficult to create custom suites, to do custom ordering, custom per-test behavior (like metrics collection before and after run(), or gc.collect() after each test, or looking for newly-opened-but-not-cleaned-up external resources like file descriptors after each tearDown).

Again: these are all concrete features that *users* of test frameworks want, not just idle architectural fantasy of us framework hackers.

I haven't had the opportunity to read the entire thread, so I don't know if this discussion has come to fruition, but I can see that some attention has been paid to these difficulties.  I have no problem with setUpClass or tearDownClass hooks *per se*, as long as they can be implemented in a way which explicitly preserves extensibility.

> Regarding the objection that setUp/tearDown for classes would run into
> issues with subclassing, I propose to let the standard semantics of
> subclasses do their job. Thus a subclass that overrides setUpClass or
> tearDownClass is responsible for calling the base class's setUpClass
> and tearDownClass (and the TestCase base class should provide empty
> versions of both). The testrunner should only call setUpClass and
> tearDownClass for classes that have at least one test that is
> selected.
> Yes, this would mean that if a base class has a test method and a
> setUpClass (and tearDownClass) method and a subclass also has a test
> method and overrides setUpClass (and/or tearDown), the base class's
> setUpClass and tearDown may be called twice. What's the big deal? If
> setUpClass and tearDownClass are written properly they should support
> this.

Just to be clear: by "written properly" you mean, written as classmethods, storing their data only on 'cls', right?

> If this behavior is undesired in a particular case, maybe what
> was really meant were module-level setUp and tearDown, or the class
> structure should be rearranged.

There's also a bit of an open question here for me: if subclassing is allowed, and module-level setup and teardown are allowed, then what if I define a test class with test methods in module 'a', as well as module setup and teardown, then subclass it in 'b' which *doesn't* have setup and teardown... is the subclass in 'b' always assumed to depend on the module-level setup in 'a'?  Is there a way that it could be made not to if it weren't necessary? What if it stubs out all of its test methods?  In the case of classes you've got the 'cls' variable to describe the dependency and the shared state, but in the case of modules, inheritance doesn't create an additional module object to hold on to.

testresources very neatly sidesteps this problem by just providing an API to say "this test case depends on that test resource", without relying on the grouping of tests within classes, modules, or packages.  Of course you can just define a class-level or module-level resource and then have all your tests depend on it, which gives you the behavior of setUpClass and setUpModule in a more general way.


More information about the Python-Dev mailing list