[Numpy-discussion] loadtxt() behavior on single-line files
Warren Weckesser
warren.weckesser at enthought.com
Thu Jun 24 14:00:54 EDT 2010
Benjamin Root wrote:
> Hi,
>
> I was having the hardest time trying to figure out an intermittent bug
> in one of my programs. Essentially, in some situations, it was
> throwing an error saying that the array object was not an array. It
> took me a while, but then I figured out that my program was assuming
> that the object returned from a loadtxt() call was always a structured
> array (I was using dtypes). However, if the data file being loaded
> only had one data record, then all you get back is a structured record.
>
> import numpy as np
> from StringIO import StringIO
>
> strData = StringIO("89.23 47.2\n13.2 42.2")
> a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
> print "Length Two"
> print a
> print a.shape
> print len(a)
>
> strData = StringIO("53.2 49.2")
> a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
> print "\n\nLength One"
> print a
> print a.shape
> try :
> print len(a)
> except TypeError as err
> print "ERROR:", err
>
> Which gets me this output:
>
> Length Two
> [(89.230000000000004, 47.200000000000003)
> (13.199999999999999, 42.200000000000003)]
> (2,)
> 2
>
>
> Length One
> (53.200000000000003, 49.200000000000003)
> ()
> ERROR: len() of unsized object
>
>
> Note that this isn't restricted to structured arrays. For regular
> ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():
Exactly. The last four lines of the function are:
X = np.squeeze(X)
if unpack:
return X.T
else:
return X
>
> >>> a = np.ones((1, 1, 1))
> >>> np.squeeze(a)[0]
> IndexError: 0-d arrays can't be indexed
>
> >>> strData = StringIO("53.2")
> >>> a = np.loadtxt(strData)
> >>> a[0]
> IndexError: 0-d arrays can't be indexed
>
> So, if you have multiple lines with multiple columns, you get a 2-D
> array, as expected.
> if you have a single line of data with multiple columns, you get a 1-D
> array.
> If you have a single column with many lines, you also get a 1-D array
> (which is probably expected, I guess).
> If you have a single column with a single line, you get a scalar
> (actually, a 0-D array).
>
> Is this a bug or a feature? I can see the advantages of having
> loadtxt() returning the lowest # of dimensions that can hold the given
> data, but it leaves the code vulnerable to certain edge cases. Maybe
> there is a different way I should be doing this, but I feel that this
> behavior at the very least should be included in the loadtxt
> documentation.
>
It would be useful to be able to tell loadtxt to not call squeeze, so a
program that reads column-formatted data doesn't have to treat the case
of a single line specially.
Warren
More information about the NumPy-Discussion
mailing list