On Mon, Feb 28, 2011 at 11:45 AM, Bruce Southey
<bsouthey@gmail.com> wrote:
On 02/28/2011 09:47 AM, Benjamin Root wrote:
On Mon, Feb 28, 2011 at 9:25 AM, Bruce
Southey
<bsouthey@gmail.com>
wrote:
On 02/28/2011 09:02 AM, Benjamin Root wrote:
[snip]
I think you need to add more details to this. So do you have
an example
of the problem that includes code and expected output?
Perhaps genfromtxt is probably more appropriate than loadtxt
for what
you want:
from StringIO import StringIO
import numpy as np
t = StringIO("1,1.3,abcde\n2,2.3,wxyz\n1\n3,3.3,mnop")
data = np.genfromtxt(t,
[('myint','i8'),('myfloat','f8'),('mystring','S5')], names =
['myint','myfloat','mystring'], delimiter=",",
invalid_raise=False)
print 'Bad data raise\n',data
This gives the output that skips the incomplete 3rd line:
/usr/lib64/python2.7/site-packages/numpy/lib/npyio.py:1507:
ConversionWarning: Some errors were detected !
Line #3 (got 1 columns instead of 3)
warnings.warn(errmsg, ConversionWarning)
Bad data raise
[(1, 1.3, 'abcde') (2, 2.3, 'wxyz') (3, 3.3, 'mnop')]
Bruce
Bruce,
I think you mis-understood the problem I was reporting.
Probably - which is why I asked for more details.
I did not connect the ticket to that email thread. Removing the
structured array part of your email, I think essentially the
argument is which should be the output of:
np.loadtxt(StringIO("89.23"))
np.arange(5)[1]
These return an 0-d array and an rather old argument about that
(which may address the other part of the ticket). Really I see this
behavior as standard so you add an example to the documentation to
reflect that.
I agree that this behavior has become standard, and, by-and-large, desirable. It just comes with this sneaky pitfall when encountering single-line files. Therefore, I have a couple of suggestions that I would find suitable for resolution of this report. I will leave it up to the developers to decide which course to pursue.
1. Add a "mindims" parameter that would default to None (for current behavior). The caller can specify the minimum number of dimensions the resulting array should have and then call some sort of function like np.atleast_nd() (I know it doesn't exists, but such a function might be useful). The documentation for this keyword param would allude to the rational for its use.
2. Keep the current behavior, but possibly not for when a dtype is specified. Given that the squeeze() was meant for addressing the situation where the data structure is not known a priori, squeezing a known dtype seems to go against this rationale.
3. Keep the current behavior, but add some documentation for loadtxt() that illustrates the problem and shows the usage of a function like np.atleast_2d(). I would be willing to write up such an example.
In addition, loadtxt fails on empty files even when
provided with a dtype. I believe genfromtxt also fails as
well in this case.
Ben Root
Errors on empty files probably should be a new bug report as that
was not in the ticket.