Zachary Pincus wrote:
Specifically, on line 115 in LineSplitter, we have: self.delimiter = delimiter.strip() or None so if I pass in, say, '\t' as the delimiter, self.delimiter gets set to None, which then causes the default behavior of any-whitespace-is- delimiter to be used. This makes lines like "Gene Name\tPubMed ID \tStarting Position" get split wrong, even when I explicitly pass in '\t' as the delimiter!
Similarly, I believe that some of the tests are formulated wrong: def test_nodelimiter(self): "Test LineSplitter w/o delimiter" strg = " 1 2 3 4 5 # test" test = LineSplitter(' ')(strg) assert_equal(test, ['1', '2', '3', '4', '5'])
I think that treating an explicitly-passed-in ' ' delimiter as identical to 'no delimiter' is a bad idea. If I say that ' ' is the delimiter, or '\t' is the delimiter, this should be treated *just* like ',' being the delimiter, where the expected output is: ['1', '2', '3', '4', '', '5']
At least, that's what I would expect. Treating contiguous blocks of whitespace as single delimiters is perfectly reasonable when None is provided as the delimiter, but when an explicit delimiter has been provided, it strikes me that the code shouldn't try to further- interpret it...
Does anyone else have any opinion here?
I agree. If the user explicity passes something as a delimiter, we should use it and not try to be too smart. +1 Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma