Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

2 Dec 2008


      Zachary Pincus wrote:
...
Specifically, on line 115 in LineSplitter, we have:
             self.delimiter = delimiter.strip() or None
so if I pass in, say, '\t' as the delimiter, self.delimiter gets set  
to None, which then causes the default behavior of any-whitespace-is- 
delimiter to be used. This makes lines like "Gene Name\tPubMed ID 
\tStarting Position" get split wrong, even when I explicitly pass in  
'\t' as the delimiter!
Similarly, I believe that some of the tests are formulated wrong:
     def test_nodelimiter(self):
         "Test LineSplitter w/o delimiter"
         strg = " 1 2 3 4  5 # test"
         test = LineSplitter(' ')(strg)
         assert_equal(test, ['1', '2', '3', '4', '5'])
I think that treating an explicitly-passed-in ' ' delimiter as  
identical to 'no delimiter' is a bad idea. If I say that ' ' is the  
delimiter, or '\t' is the delimiter, this should be treated *just*  
like ',' being the delimiter, where the expected output is:
['1', '2', '3', '4', '', '5']
At least, that's what I would expect. Treating contiguous blocks of  
whitespace as single delimiters is perfectly reasonable when None is  
provided as the delimiter, but when an explicit delimiter has been  
provided, it strikes me that the code shouldn't try to further- 
interpret it...
Does anyone else have any opinion here?
I agree.  If the user explicity passes something as a delimiter, we 
should use it and not try to be too smart.

+1

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma