Re: [Numpy-discussion] loadtxt/savetxt tickets

April 4, 2011


      On Mon, Apr 4, 2011 at 9:59 AM, Bruce Southey <bsouthey@gmail.com> wrote:
...
On 03/31/2011 12:02 PM, Derek Homeier wrote:
...
On 31 Mar 2011, at 17:03, Bruce Southey wrote:
...
This is an invalid ticket because the docstring clearly states that in
3 different, yet critical places, that missing values are not handled
here:
"Each row in the text file must have the same number of values."
"genfromtxt : Load data with missing values handled as specified."
"   This function aims to be a fast reader for simply formatted
files.  The
    `genfromtxt` function provides more sophisticated handling of,
e.g.,
    lines with missing values."
Really I am trying to separate the usage of loadtxt and genfromtxt to
avoid unnecessary duplication and confusion. Part of this is
historical because loadtxt was added in 2007 and genfromtxt was added
in 2009. So really certain features of loadtxt have been  'kept' for
backwards compatibility purposes yet these features can be 'abused' to
handle missing data. But I really consider that any missing values
should cause loadtxt to fail.
OK, I was not aware of the design issues of loadtxt vs. genfromtxt -
you could probably say also for historical reasons since I have not
used genfromtxt much so far.
Anyway the docstring statement "Converters can also be used to
          provide a default value for missing data:"
then appears quite misleading, or an invitation to abuse, if you will.
This should better be removed from the documentation then, or users
explicitly discouraged from using converters instead of genfromtxt
(I don't see how you could completely prevent using converters in
this way).
...
The patch is incorrect because it should not include a space in the
split() as indicated in the comment by the original reporter. Of
The split('\r\n') alone caused test_dtype_with_object(self) to fail,
probably
because it relies on stripping the blanks. But maybe the test is ill-
formed?
...
course a corrected patch alone still is not sufficient to address the
problem without the user providing the correct converter. Also you
start to run into problems with multiple delimiters (such as one space
versus two spaces) so you start down the path to add all the features
that duplicate genfromtxt.
Given that genfromtxt provides that functionality more conveniently,
I agree again users should be encouraged to use this instead of
converters.
But the actual tab-problem causes in fact an issue not related to
missing
values at all (well, depending on what you call a missing value).
I am describing an example on the ticket.
Cheers,
                                      Derek
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Okay I see that 1071 got closed which I am fine with.
I think that your following example should be a test because the two
spaces should not be removed with a tab delimiter:
np.loadtxt(StringIO("aa\tbb\n \t \ncc\t"), delimiter='\t',
dtype=np.dtype([('label', 'S4'), ('comment', 'S4')]))
Make a test and we'll put it in.

Chuck

Re: [Numpy-discussion] loadtxt/savetxt tickets

Charles R Harris