
On 03/31/2011 10:08 AM, Ralf Gommers wrote:
On Thu, Mar 31, 2011 at 5:03 PM, Bruce Southey<bsouthey@gmail.com> wrote:
On Wed, Mar 30, 2011 at 9:53 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes <paul.anton.letnes@gmail.com> wrote:
On 26. mars 2011, at 21.44, Derek Homeier wrote:
Hi Paul,
having had a look at the other tickets you dug up,
[snip]
1071: It is not clear to me whether loadtxt is supposed to support missing values in the fashion indicated in the ticket. In principle it should at least allow you to, by the use of converters as described there. The problem is, the default delimiter is described as 'any whitespace', which in the present implementation obviously includes any number of blanks or tabs. These are therefore treated differently from delimiters like ',' or '&'. I'd reckon there are too many people actually relying on this behaviour to silently change it (e.g. I know plenty of tables with columns separated by either one or several tabs depending on the length of the previous entry). But the tab is apparently also treated differently if explicitly specified with "delimiter='\t'" - and in that case using a converter à la {2: lambda s: float(s or 'Nan')} is working for fields in the middle of the line, but not at the end - clearly warrants improvement. I've prepared a patch working for Python3 as well. Great!
This is an invalid ticket because the docstring clearly states that in 3 different, yet critical places, that missing values are not handled here:
"Each row in the text file must have the same number of values." "genfromtxt : Load data with missing values handled as specified." " This function aims to be a fast reader for simply formatted files. The `genfromtxt` function provides more sophisticated handling of, e.g., lines with missing values."
Really I am trying to separate the usage of loadtxt and genfromtxt to avoid unnecessary duplication and confusion. Part of this is historical because loadtxt was added in 2007 and genfromtxt was added in 2009. So really certain features of loadtxt have been 'kept' for backwards compatibility purposes yet these features can be 'abused' to handle missing data. But I really consider that any missing values should cause loadtxt to fail. I agree with you Bruce, but it would be easier to discuss this on the tickets instead of here. Could you add your comments there please?
Ralf
'Easier' seems a contradiction when you have use captcha... Sure I will add more comments there. Bruce