[Tutor] duplication in unit tests

Wed Dec 9 12:26:55 CET 2009

Serdar Tumgoren <zstumgoren at gmail.com> dixit:

> Hi everyone,
> I'm trying to apply some lessons from the recent list discussions on
> unit testing and Test-Driven Development, but I seem to have hit a
> sticking point.
> 
> As part of my program, I'm planning to create objects that perform
> some initial data clean-up and then parse and database the cleaned
> data. Currently I'm expecting to have a FileCleaner and Parser
> classes. Using the TDD approach, I've so far come up with the below:
> 
> class FileCleaner(object):
>     def __init__(self, datastring):
>         self.source = datastring
> 
>     def convertEmDashes(self):
>         """Convert unicode emdashes to minus signs"""
>         self.datastring = self.source.replace(u'\u2014','-')
> 
>     def splitLines(self):
>         """Generate and store a list of cleaned, non-empty lines"""
>         self.data = [x.strip() for x in
> self.datastring.strip().split('\n') if x.strip()]
> 
> 
> My confusion involves the test code for the above class and its
> methods. The only way I can get splitLines to pass its unit test is by
> first calling the convertEmDashes method, and then splitLines.
> 
> class TestFileCleaner(unittest.TestCase):
>     def setUp(self):
>         self.sourcestring = u"""This    line   has an em\u2014dash.\n
>                 So   does this  \u2014\n."""
>         self.cleaner = FileCleaner(self.sourcestring)
> 
>     def test_convertEmDashes(self):
>         """convertEmDashes should remove minus signs from datastring
> attribute"""
>         teststring = self.sourcestring.replace(u'\u2014','-')
>         self.cleaner.convertEmDashes()
>         self.assertEqual(teststring, self.cleaner.datastring)
> 
>     def test_splitLines(self):
>         """splitLines should create a list of cleaned lines"""
>         teststring = self.sourcestring.replace(u'\u2014','-')
>         data = [x.strip() for x in teststring.strip().split('\n') if x.strip()]
>         self.cleaner.convertEmDashes()
>         self.cleaner.splitLines()
>         self.assertEqual(data, self.cleaner.data)
> 
> Basically, I'm duplicating the steps from the first test method in the
> second test method (and this duplication will accrue as I add more
> "cleaning" methods).
> 
> I understand that TestCase's setUp method is called before each test
> is run (and therefore the FileCleaner object is created anew), but
> this coupling of a test to other methods of the class under test seems
> to violate the principle of testing methods in isolation.
> 
> So my questions -- Am I misunderstanding how to properly write unit
> tests for this case? Or perhaps I've structured my program
> incorrectly, and that's what this duplication reveals? I suspected,
> for instance, that perhaps I should group these methods
> (convertEmDashes, splitLines, etc.) into a single larger function or
> method.
> 
> But that approach seems to violate the "best practice" of writing
> small methods. As you can tell, I'm a bit at sea on this.  Your
> guidance is greatly appreciated!!
> 
> Regards,
> Serdar
> 
> ps - recommendations on cleaning up and restructuring code are also welcome!

Hello,

I guess you're first confused at the design level of your app. Test and design both require you to clearly express your expectations. Here, the cleanup phase may be written as follow (I don't mean it's particuliarly good, just an example):

plain source data = input   -->   output = ready-to-process data

As you see, this requirement is, conceptually speaking, a purely function-al one; in the plain sense of the word "function". At least, this is the way I see it.
Building an object to implement it is imo a wrong interpretation of OO design. (It's also writing java in python ;-) I would rather chose to write it as a method of a higher-level object. Possibly, this method would split into smaller ones if needed.

Then, expressing your tests is in a sense translating the requirement above into code: feeding the piece of code to be tested with raw input data and checking the output is as expected. As well expressed by Kent, you should test with typical, edge, *and wrong* input; in the latter case the test is expected to fail.
You will have to hand-write or automatically produce input strings for each test. If the func is split, then you will have to do it for each mini-func to be tested. This can be rather unpleasant, especially in cases like yours where funcs look like logically operating in sequence, but there is no way to escape. Actually, the several cleanup tasks (translating special chars, skipping blank lines, etc...) are rather orthogonal: they don't need to be tested in sequence.

Denis
________________________________

la vita e estrany

http://spir.wikidot.com/