[Numpy-discussion] loadtxt/savetxt tickets

Derek Homeier derek at astro.physik.uni-goettingen.de
Thu Mar 31 13:02:41 EDT 2011


On 31 Mar 2011, at 17:03, Bruce Southey wrote:

> This is an invalid ticket because the docstring clearly states that in
> 3 different, yet critical places, that missing values are not handled
> here:
>
> "Each row in the text file must have the same number of values."
> "genfromtxt : Load data with missing values handled as specified."
> "   This function aims to be a fast reader for simply formatted  
> files.  The
>    `genfromtxt` function provides more sophisticated handling of,  
> e.g.,
>    lines with missing values."
>
> Really I am trying to separate the usage of loadtxt and genfromtxt to
> avoid unnecessary duplication and confusion. Part of this is
> historical because loadtxt was added in 2007 and genfromtxt was added
> in 2009. So really certain features of loadtxt have been  'kept' for
> backwards compatibility purposes yet these features can be 'abused' to
> handle missing data. But I really consider that any missing values
> should cause loadtxt to fail.
>
OK, I was not aware of the design issues of loadtxt vs. genfromtxt -
you could probably say also for historical reasons since I have not
used genfromtxt much so far.
Anyway the docstring statement "Converters can also be used to
         provide a default value for missing data:"
then appears quite misleading, or an invitation to abuse, if you will.
This should better be removed from the documentation then, or users
explicitly discouraged from using converters instead of genfromtxt
(I don't see how you could completely prevent using converters in
this way).

> The patch is incorrect because it should not include a space in the
> split() as indicated in the comment by the original reporter. Of

The split('\r\n') alone caused test_dtype_with_object(self) to fail,  
probably
because it relies on stripping the blanks. But maybe the test is ill- 
formed?

> course a corrected patch alone still is not sufficient to address the
> problem without the user providing the correct converter. Also you
> start to run into problems with multiple delimiters (such as one space
> versus two spaces) so you start down the path to add all the features
> that duplicate genfromtxt.

Given that genfromtxt provides that functionality more conveniently,
I agree again users should be encouraged to use this instead of  
converters.
But the actual tab-problem causes in fact an issue not related to  
missing
values at all (well, depending on what you call a missing value).
I am describing an example on the ticket.

Cheers,
					Derek




More information about the NumPy-Discussion mailing list