[Tutor] duplication in unit tests
Dave Angel
davea at ieee.org
Wed Dec 9 07:19:35 CET 2009
Serdar Tumgoren wrote:
> Hi Kent and Lie,
>
> First, thanks to you both for the help. I reworked the tests and then
> the main code according to your suggestions (I really was muddling
> these TDD concepts!).
>
> The reworked code and tests are below. In the tests, I hard-coded the
> source data and the expected results; in the main program code, I
> eliminated the FileCleaner class and converted its methods to
> stand-alone functions. I'm planning to group them into a single,
> larger "process" function as you all suggested.
>
> Meantime, I'd be grateful if you could critique whether I've properly
> followed your advice. And of course, feel free to suggest other tests
> that might be appropriate. For instance, would it make sense to test
> convertEmDashes for non-unicode input?
>
> Thanks again!
> Serdar
>
> #### test_cleaner.py ####
> from cleaner import convertEmDashes, splitLines
>
> class TestCleanerMethods(unittest.TestCase):
> def test_convertEmDashes(self):
> """convertEmDashes to minus signs"""
> srce = u"""This line has an em\u2014dash.\nSo does this
> \u2014.\n"""
> expected = u"""This line has an em-dash.\nSo does this -.\n"""
> result = convertEmDashes(srce)
> self.assertEqual(result, expected)
>
> def test_splitLines(self):
> """splitLines should create a list of cleaned lines"""
> srce = u"""This line has an em\u2014dash.\nSo does this
> \u2014.\n"""
> expected = [u'This line has an em\u2014dash.', u'So
> does this \u2014.']
> result = splitLines(srce)
> self.assertEqual(result, expected)
>
>
> #### cleaner.py ####
> def convertEmDashes(datastring):
> """Convert unicode emdashes to minus signs"""
> datastring = datastring.replace(u'\u2014','-')
>
I think the 'dash' should be a unicode one, at least if you're expecting
the datastring to be unicode.
datastring = datastring.replace(u'\u2014',u'-')
It will probably be slightly more efficient, but more importantly, it'll make it clear what you're expecting.
> return datastring
>
> def splitLines(datastring):
> """Generate list of cleaned lines"""
> data = [x.strip() for x in datastring.strip().split('\n') if x.strip()]
> return data
>
>
And in both these functions, the doc string doesn't reflect the function
very well (any more). They both should indicate what kind of data they
expect (unicode?), and the latter one should not say that the lines are
cleaned. What it should say is that the lines in the list have no
leading or trailing whitespace, and that blank lines are dropped.
Once you have multiple "cleanup" functions, the unit tests become much
more important. For example, the order of application of the cleanups
could matter a lot. And pretty soon you'll have to document just what
your public interface is. If your "user" may only call the overall
cleanup() function, then blackbox testing only needs to examine that
one, and whitebox testing can deal with the functions entirely
independently.
DaveA
More information about the Tutor
mailing list