
On Thu, Jun 24, 2010 at 1:00 PM, Warren Weckesser < warren.weckesser@enthought.com> wrote:
Benjamin Root wrote:
Hi,
I was having the hardest time trying to figure out an intermittent bug in one of my programs. Essentially, in some situations, it was throwing an error saying that the array object was not an array. It took me a while, but then I figured out that my program was assuming that the object returned from a loadtxt() call was always a structured array (I was using dtypes). However, if the data file being loaded only had one data record, then all you get back is a structured record.
import numpy as np from StringIO import StringIO
strData = StringIO("89.23 47.2\n13.2 42.2") a = np.loadtxt(strData, dtype=[('x', float), ('y', float)]) print "Length Two" print a print a.shape print len(a)
strData = StringIO("53.2 49.2") a = np.loadtxt(strData, dtype=[('x', float), ('y', float)]) print "\n\nLength One" print a print a.shape try : print len(a) except TypeError as err print "ERROR:", err
Which gets me this output:
Length Two [(89.230000000000004, 47.200000000000003) (13.199999999999999, 42.200000000000003)] (2,) 2
Length One (53.200000000000003, 49.200000000000003) () ERROR: len() of unsized object
Note that this isn't restricted to structured arrays. For regular ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():
Exactly. The last four lines of the function are:
X = np.squeeze(X) if unpack: return X.T else: return X
a = np.ones((1, 1, 1)) np.squeeze(a)[0] IndexError: 0-d arrays can't be indexed
strData = StringIO("53.2") a = np.loadtxt(strData) a[0] IndexError: 0-d arrays can't be indexed
So, if you have multiple lines with multiple columns, you get a 2-D array, as expected. if you have a single line of data with multiple columns, you get a 1-D array. If you have a single column with many lines, you also get a 1-D array (which is probably expected, I guess). If you have a single column with a single line, you get a scalar (actually, a 0-D array).
Is this a bug or a feature? I can see the advantages of having loadtxt() returning the lowest # of dimensions that can hold the given data, but it leaves the code vulnerable to certain edge cases. Maybe there is a different way I should be doing this, but I feel that this behavior at the very least should be included in the loadtxt documentation.
It would be useful to be able to tell loadtxt to not call squeeze, so a program that reads column-formatted data doesn't have to treat the case of a single line specially.
Warren
I don't know if that is the best way to solve the problem. In that case, you would always get a 2-D array, right? Is that useful for those who have text data as a single column? Maybe a mindim keyword (with None as default) and apply an appropriate "atleast_Nd()" call (or maybe have available an .atleast_nd() function?). But, then what would this mean for structured arrays? One might think that they want at least 2-D, but they really want at least 1-D. Ben Root P.S. - Taking this a step further, the functions completely fail in dealing with empty files... In MATLAB, it returns an empty array (matrix?).