
On Dec 2, 2008, at 3:12 PM, Ryan May wrote:
Pierre GM wrote:
Well, looks like the attachment is too big, so here's the implementation. The tests will come in another message.
A couple of quick nitpicks:
1) On line 186 (in the NameValidator class), you use excludelist.append() to append a list to the end of a list. I think you meant to use excludelist.extend()
Good call.
2) When validating a list of names, why do you insist on lower casing them? (I'm referring to the call to lower() on line 207). On one hand, this would seem nicer than all upper case, but on the other hand this can cause confusion for someone who sees certain casing of names in the file and expects that data to be laid out the same.
I recall a life where names were case-insensitives, so 'dates' and 'Dates' and 'DATES' were the same field. It should be easy enough to get rid of that limitations, or add a parameter for case-sensitivity On Dec 2, 2008, at 2:47 PM, Zachary Pincus wrote:
Specifically, on line 115 in LineSplitter, we have: self.delimiter = delimiter.strip() or None so if I pass in, say, '\t' as the delimiter, self.delimiter gets set to None, which then causes the default behavior of any-whitespace-is- delimiter to be used. This makes lines like "Gene Name\tPubMed ID \tStarting Position" get split wrong, even when I explicitly pass in '\t' as the delimiter!
OK, I'll check that.
I think that treating an explicitly-passed-in ' ' delimiter as identical to 'no delimiter' is a bad idea. If I say that ' ' is the delimiter, or '\t' is the delimiter, this should be treated *just* like ',' being the delimiter, where the expected output is: ['1', '2', '3', '4', '', '5']
Valid point. Well, all, stay tuned for yet another "yet another implementation..."
Other than those, it's working fine for me here.
Ryan