On Mon, Apr 4, 2011 at 9:59 AM, Bruce Southey <bsouthey@gmail.com> wrote:

On 03/31/2011 12:02 PM, Derek Homeier wrote:

> On 31 Mar 2011, at 17:03, Bruce Southey wrote:
>
>> This is an invalid ticket because the docstring clearly states that in
>> 3 different, yet critical places, that missing values are not handled
>> here:
>>
>> "Each row in the text file must have the same number of values."
>> "genfromtxt : Load data with missing values handled as specified."
>> " This function aims to be a fast reader for simply formatted
>> files. The
>> `genfromtxt` function provides more sophisticated handling of,
>> e.g.,
>> lines with missing values."
>>
>> Really I am trying to separate the usage of loadtxt and genfromtxt to
>> avoid unnecessary duplication and confusion. Part of this is
>> historical because loadtxt was added in 2007 and genfromtxt was added
>> in 2009. So really certain features of loadtxt have been 'kept' for
>> backwards compatibility purposes yet these features can be 'abused' to
>> handle missing data. But I really consider that any missing values
>> should cause loadtxt to fail.
>>
> OK, I was not aware of the design issues of loadtxt vs. genfromtxt -
> you could probably say also for historical reasons since I have not
> used genfromtxt much so far.
> Anyway the docstring statement "Converters can also be used to
> provide a default value for missing data:"
> then appears quite misleading, or an invitation to abuse, if you will.
> This should better be removed from the documentation then, or users
> explicitly discouraged from using converters instead of genfromtxt
> (I don't see how you could completely prevent using converters in
> this way).
>
>> The patch is incorrect because it should not include a space in the
>> split() as indicated in the comment by the original reporter. Of
> The split('\r\n') alone caused test_dtype_with_object(self) to fail,
> probably
> because it relies on stripping the blanks. But maybe the test is ill-
> formed?
>
>> course a corrected patch alone still is not sufficient to address the
>> problem without the user providing the correct converter. Also you
>> start to run into problems with multiple delimiters (such as one space
>> versus two spaces) so you start down the path to add all the features
>> that duplicate genfromtxt.
> Given that genfromtxt provides that functionality more conveniently,
> I agree again users should be encouraged to use this instead of
> converters.
> But the actual tab-problem causes in fact an issue not related to
> missing
> values at all (well, depending on what you call a missing value).
> I am describing an example on the ticket.
>
> Cheers,
> Derek
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

Okay I see that 1071 got closed which I am fine with.

I think that your following example should be a test because the two
spaces should not be removed with a tab delimiter:
np.loadtxt(StringIO("aa\tbb\n \t \ncc\t"), delimiter='\t',
dtype=np.dtype([('label', 'S4'), ('comment', 'S4')]))

Make a test and we'll put it in.

Chuck