[Numpy-discussion] ticket 1562 on loadtxt (was Re: Numpy 2.0 schedule)

Wed Mar 2 17:21:23 EST 2011

On Mon, Feb 28, 2011 at 11:45 AM, Bruce Southey <bsouthey at gmail.com> wrote:

>  On 02/28/2011 09:47 AM, Benjamin Root wrote:
>
> On Mon, Feb 28, 2011 at 9:25 AM, Bruce Southey <bsouthey at gmail.com> wrote:
>
>> On 02/28/2011 09:02 AM, Benjamin Root wrote:
>> [snip]
>> >
>> >
>> > So, is there still no hope in addressing this old bug report of mine?
>> >
>> > http://projects.scipy.org/numpy/ticket/1562
>> >
>> > Ben Root
>> >
>>  I think you need to add more details to this. So do you have an example
>> of the problem that includes code and expected output?
>>
>> Perhaps genfromtxt is probably more appropriate than loadtxt for what
>> you want:
>>
>> from StringIO import StringIO
>> import numpy as np
>> t = StringIO("1,1.3,abcde\n2,2.3,wxyz\n1\n3,3.3,mnop")
>> data = np.genfromtxt(t,
>> [('myint','i8'),('myfloat','f8'),('mystring','S5')], names =
>> ['myint','myfloat','mystring'], delimiter=",", invalid_raise=False)
>> print 'Bad data raise\n',data
>>
>> This gives the output that skips the incomplete 3rd line:
>>
>> /usr/lib64/python2.7/site-packages/numpy/lib/npyio.py:1507:
>> ConversionWarning: Some errors were detected !
>>     Line #3 (got 1 columns instead of 3)
>>   warnings.warn(errmsg, ConversionWarning)
>> Bad data raise
>> [(1, 1.3, 'abcde') (2, 2.3, 'wxyz') (3, 3.3, 'mnop')]
>>
>>
>> Bruce
>>
>>
> Bruce,
>
> I think you mis-understood the problem I was reporting.
>
> Probably - which is why I asked for more details.
>
>  You can find the discussion thread here:
>
> http://www.mail-archive.com/numpy-discussion@scipy.org/msg26235.html
>
> I have proposed that at the very least, an example of this problem is added
> to the documentation of loadtxt so that users know to be aware of this
> possibility.
>
> I did not connect the ticket to that email thread. Removing the structured
> array part of your email, I think essentially the argument is which should
> be the output of:
> np.loadtxt(StringIO("89.23"))
> np.arange(5)[1]
>
> These return an 0-d array and an rather old argument about that (which may
> address the other part of the ticket). Really I see this behavior as
> standard so you add an example to the documentation to reflect that.
>
>
I agree that this behavior has become standard, and, by-and-large,
desirable.  It just comes with this sneaky pitfall when encountering
single-line files.  Therefore, I have a couple of suggestions that I would
find suitable for resolution of this report.  I will leave it up to the
developers to decide which course to pursue.

1. Add a "mindims" parameter that would default to None (for current
behavior).  The caller can specify the minimum number of dimensions the
resulting array should have and then call some sort of function like
np.atleast_nd() (I know it doesn't exists, but such a function might be
useful).  The documentation for this keyword param would allude to the
rational for its use.

2. Keep the current behavior, but possibly not for when a dtype is
specified.  Given that the squeeze() was meant for addressing the situation
where the data structure is not known a priori, squeezing a known dtype
seems to go against this rationale.

3. Keep the current behavior, but add some documentation for loadtxt() that
illustrates the problem and shows the usage of a function like
np.atleast_2d().  I would be willing to write up such an example.

>
>  In addition, loadtxt fails on empty files even when provided with a
> dtype.  I believe genfromtxt also fails as well in this case.
>
> Ben Root
>
>   Errors on empty files probably should be a new bug report as that was
> not in the ticket.
>
>
Done:  http://projects.scipy.org/numpy/ticket/1752

Thanks,
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110302/505076fc/attachment.html>