[Tutor] duplication in unit tests

Kent Johnson kent37 at tds.net
Wed Dec 9 00:43:10 CET 2009


On Tue, Dec 8, 2009 at 6:02 PM, Serdar Tumgoren <zstumgoren at gmail.com> wrote:

> As part of my program, I'm planning to create objects that perform
> some initial data clean-up and then parse and database the cleaned
> data. Currently I'm expecting to have a FileCleaner and Parser
> classes. Using the TDD approach, I've so far come up with the below:
>
> class FileCleaner(object):
>    def __init__(self, datastring):
>        self.source = datastring
>
>    def convertEmDashes(self):
>        """Convert unicode emdashes to minus signs"""
>        self.datastring = self.source.replace(u'\u2014','-')
>
>    def splitLines(self):
>        """Generate and store a list of cleaned, non-empty lines"""
>        self.data = [x.strip() for x in
> self.datastring.strip().split('\n') if x.strip()]
>
>
> My confusion involves the test code for the above class and its
> methods. The only way I can get splitLines to pass its unit test is by
> first calling the convertEmDashes method, and then splitLines.
>
> class TestFileCleaner(unittest.TestCase):
>    def setUp(self):
>        self.sourcestring = u"""This    line   has an em\u2014dash.\n
>                So   does this  \u2014\n."""
>        self.cleaner = FileCleaner(self.sourcestring)
>
>    def test_convertEmDashes(self):
>        """convertEmDashes should remove minus signs from datastring
> attribute"""
>        teststring = self.sourcestring.replace(u'\u2014','-')
>        self.cleaner.convertEmDashes()
>        self.assertEqual(teststring, self.cleaner.datastring)
>
>    def test_splitLines(self):
>        """splitLines should create a list of cleaned lines"""
>        teststring = self.sourcestring.replace(u'\u2014','-')
>        data = [x.strip() for x in teststring.strip().split('\n') if x.strip()]
>        self.cleaner.convertEmDashes()
>        self.cleaner.splitLines()
>        self.assertEqual(data, self.cleaner.data)
>
> Basically, I'm duplicating the steps from the first test method in the
> second test method (and this duplication will accrue as I add more
> "cleaning" methods).

I see a few problems with this.

You are confused about what splitLines() does. It does not create a
list of cleaned lines, it just splits the lines. Because of your
confusion about splitLines(), your test is not just testing
splitLines(), it is testing convertEmDashes() and splitLines(). That
is why you have code duplication. test_splitLines() could look like
this:
   def test_splitLines(self):
       """splitLines should create a list of split lines"""
       teststring = self.sourcestring
       data = [x.strip() for x in teststring.strip().split('\n') if x.strip()]
       self.cleaner.splitLines()
       self.assertEqual(data, self.cleaner.data)

Your tests are not very good. They don't really test anything because
they use the same code that you are trying to test. What if
str.replace() or split() or strip() does not work the way you expect?
Your tests would not discover this. You should just hard-code the
expected result strings.  I would write test_splitLines() like this:
   def test_splitLines(self):
       """splitLines should create a list of split lines"""
       data = [u"This    line   has an em\u2014dash.", u"So   does
this  \u2014", u"."]
       self.cleaner.splitLines()
       self.assertEqual(data, self.cleaner.data)

You probably don't want to hard-code the source string in the setup
method. Typically you want to test a function with multiple inputs so
you can check its behaviour with typical values, edge cases and
invalid input. For example test_splitLines() doesn't verify that
splitLines() removes blank lines; that would be a good test case. You
might make a list of pairs of (input value, expected result) and pass
each one to splitLines().

> So my questions -- Am I misunderstanding how to properly write unit
> tests for this case? Or perhaps I've structured my program
> incorrectly, and that's what this duplication reveals? I suspected,
> for instance, that perhaps I should group these methods
> (convertEmDashes, splitLines, etc.) into a single larger function or
> method.

Yes, your tests are revealing a problem with the structure. You should
probably have a single process() method that does all the cleanup
methods and the split. Then you could also have a test for this.

There is really no need for a class here. You could write separate
functions for each cleanup and for the split, then another function
that puts them all together. This would be easier to test, too.

Kent


More information about the Tutor mailing list