> On 31 Mar 2011, at 17:03, Bruce Southey
wrote:
>
>> This is an invalid ticket because the docstring
clearly states that in
>> 3 different, yet critical places, that missing
values are not handled
>> here:
>>
>> "Each row in the text file must have the same
number of values."
>> "genfromtxt : Load data with missing values
handled as specified."
>> " This function aims to be a fast reader for
simply formatted
>> files. The
>> `genfromtxt` function provides more
sophisticated handling of,
>> e.g.,
>> lines with missing values."
>>
>> Really I am trying to separate the usage of
loadtxt and genfromtxt to
>> avoid unnecessary duplication and confusion. Part
of this is
>> historical because loadtxt was added in 2007 and
genfromtxt was added
>> in 2009. So really certain features of loadtxt
have been 'kept' for
>> backwards compatibility purposes yet these
features can be 'abused' to
>> handle missing data. But I really consider that
any missing values
>> should cause loadtxt to fail.
>>
> OK, I was not aware of the design issues of loadtxt
vs. genfromtxt -
> you could probably say also for historical reasons
since I have not
> used genfromtxt much so far.
> Anyway the docstring statement "Converters can also
be used to
> provide a default value for missing data:"
> then appears quite misleading, or an invitation to
abuse, if you will.
> This should better be removed from the documentation
then, or users
> explicitly discouraged from using converters instead
of genfromtxt
> (I don't see how you could completely prevent using
converters in
> this way).
>
>> The patch is incorrect because it should not
include a space in the
>> split() as indicated in the comment by the
original reporter. Of
> The split('\r\n') alone caused
test_dtype_with_object(self) to fail,
> probably
> because it relies on stripping the blanks. But maybe
the test is ill-
> formed?
>
>> course a corrected patch alone still is not
sufficient to address the
>> problem without the user providing the correct
converter. Also you
>> start to run into problems with multiple
delimiters (such as one space
>> versus two spaces) so you start down the path to
add all the features
>> that duplicate genfromtxt.
> Given that genfromtxt provides that functionality
more conveniently,
> I agree again users should be encouraged to use this
instead of
> converters.
> But the actual tab-problem causes in fact an issue
not related to
> missing
> values at all (well, depending on what you call a
missing value).
> I am describing an example on the ticket.
>
> Cheers,
> Derek
>
> _______________________________________________
> NumPy-Discussion mailing list
>
NumPy-Discussion@scipy.org
>
http://mail.scipy.org/mailman/listinfo/numpy-discussion