[Tutor] duplication in unit tests

Serdar Tumgoren zstumgoren at gmail.com
Wed Dec 9 00:02:50 CET 2009

Hi everyone,
I'm trying to apply some lessons from the recent list discussions on
unit testing and Test-Driven Development, but I seem to have hit a
sticking point.

As part of my program, I'm planning to create objects that perform
some initial data clean-up and then parse and database the cleaned
data. Currently I'm expecting to have a FileCleaner and Parser
classes. Using the TDD approach, I've so far come up with the below:

class FileCleaner(object):
    def __init__(self, datastring):
        self.source = datastring

    def convertEmDashes(self):
        """Convert unicode emdashes to minus signs"""
        self.datastring = self.source.replace(u'\u2014','-')

    def splitLines(self):
        """Generate and store a list of cleaned, non-empty lines"""
        self.data = [x.strip() for x in
self.datastring.strip().split('\n') if x.strip()]

My confusion involves the test code for the above class and its
methods. The only way I can get splitLines to pass its unit test is by
first calling the convertEmDashes method, and then splitLines.

class TestFileCleaner(unittest.TestCase):
    def setUp(self):
        self.sourcestring = u"""This    line   has an em\u2014dash.\n
                So   does this  \u2014\n."""
        self.cleaner = FileCleaner(self.sourcestring)

    def test_convertEmDashes(self):
        """convertEmDashes should remove minus signs from datastring
        teststring = self.sourcestring.replace(u'\u2014','-')
        self.assertEqual(teststring, self.cleaner.datastring)

    def test_splitLines(self):
        """splitLines should create a list of cleaned lines"""
        teststring = self.sourcestring.replace(u'\u2014','-')
        data = [x.strip() for x in teststring.strip().split('\n') if x.strip()]
        self.assertEqual(data, self.cleaner.data)

Basically, I'm duplicating the steps from the first test method in the
second test method (and this duplication will accrue as I add more
"cleaning" methods).

I understand that TestCase's setUp method is called before each test
is run (and therefore the FileCleaner object is created anew), but
this coupling of a test to other methods of the class under test seems
to violate the principle of testing methods in isolation.

So my questions -- Am I misunderstanding how to properly write unit
tests for this case? Or perhaps I've structured my program
incorrectly, and that's what this duplication reveals? I suspected,
for instance, that perhaps I should group these methods
(convertEmDashes, splitLines, etc.) into a single larger function or

But that approach seems to violate the "best practice" of writing
small methods. As you can tell, I'm a bit at sea on this.  Your
guidance is greatly appreciated!!


ps - recommendations on cleaning up and restructuring code are also welcome!

More information about the Tutor mailing list