On Thu, Jun 24, 2010 at 1:53 PM, Benjamin Root <ben.root@ou.edu> wrote:

On Thu, Jun 24, 2010 at 1:00 PM, Warren Weckesser <warren.weckesser@enthought.com> wrote:

Benjamin Root wrote:
> Hi,
>
> I was having the hardest time trying to figure out an intermittent bug
> in one of my programs. Essentially, in some situations, it was
> throwing an error saying that the array object was not an array. It
> took me a while, but then I figured out that my program was assuming
> that the object returned from a loadtxt() call was always a structured
> array (I was using dtypes). However, if the data file being loaded
> only had one data record, then all you get back is a structured record.
>
> import numpy as np
> from StringIO import StringIO
>
> strData = StringIO("89.23 47.2\n13.2 42.2")
> a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
> print "Length Two"
> print a
> print a.shape
> print len(a)
>
> strData = StringIO("53.2 49.2")
> a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
> print "\n\nLength One"
> print a
> print a.shape
> try :
> print len(a)
> except TypeError as err
> print "ERROR:", err
>
> Which gets me this output:
>
> Length Two
> [(89.230000000000004, 47.200000000000003)
> (13.199999999999999, 42.200000000000003)]
> (2,)
> 2
>
>
> Length One
> (53.200000000000003, 49.200000000000003)
> ()
> ERROR: len() of unsized object
>
>
> Note that this isn't restricted to structured arrays. For regular
> ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():

Exactly. The last four lines of the function are:

X = np.squeeze(X)
if unpack:
return X.T
else:
return X

>
> >>> a = np.ones((1, 1, 1))
> >>> np.squeeze(a)[0]
> IndexError: 0-d arrays can't be indexed
>
> >>> strData = StringIO("53.2")
> >>> a = np.loadtxt(strData)
> >>> a[0]
> IndexError: 0-d arrays can't be indexed
>
> So, if you have multiple lines with multiple columns, you get a 2-D
> array, as expected.
> if you have a single line of data with multiple columns, you get a 1-D
> array.
> If you have a single column with many lines, you also get a 1-D array
> (which is probably expected, I guess).
> If you have a single column with a single line, you get a scalar
> (actually, a 0-D array).
>
> Is this a bug or a feature? I can see the advantages of having
> loadtxt() returning the lowest # of dimensions that can hold the given
> data, but it leaves the code vulnerable to certain edge cases. Maybe
> there is a different way I should be doing this, but I feel that this
> behavior at the very least should be included in the loadtxt
> documentation.
>

It would be useful to be able to tell loadtxt to not call squeeze, so a
program that reads column-formatted data doesn't have to treat the case
of a single line specially.

Warren

I don't know if that is the best way to solve the problem. In that case, you would always get a 2-D array, right? Is that useful for those who have text data as a single column? Maybe a mindim keyword (with None as default) and apply an appropriate "atleast_Nd()" call (or maybe have available an .atleast_nd() function?). But, then what would this mean for structured arrays? One might think that they want at least 2-D, but they really want at least 1-D.

Ben Root

P.S. - Taking this a step further, the functions completely fail in dealing with empty files... In MATLAB, it returns an empty array (matrix?).