On Thu, Jun 24, 2010 at 1:53 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Thu, Jun 24, 2010 at 1:00 PM, Warren Weckesser <warren.weckesser@enthought.com> wrote:
Benjamin Root wrote:
> Hi,
>
> I was having the hardest time trying to figure out an intermittent bug
> in one of my programs.  Essentially, in some situations, it was
> throwing an error saying that the array object was not an array.  It
> took me a while, but then I figured out that my program was assuming
> that the object returned from a loadtxt() call was always a structured
> array (I was using dtypes).  However, if the data file being loaded
> only had one data record, then all you get back is a structured record.
>
> import numpy as np
> from StringIO import StringIO
>
> strData = StringIO("89.23 47.2\n13.2 42.2")
> a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
> print "Length Two"
> print a
> print a.shape
> print len(a)
>
> strData = StringIO("53.2 49.2")
> a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
> print "\n\nLength One"
> print a
> print a.shape
> try :
>     print len(a)
> except TypeError as err
>     print "ERROR:", err
>
> Which gets me this output:
>
> Length Two
> [(89.230000000000004, 47.200000000000003)
>  (13.199999999999999, 42.200000000000003)]
> (2,)
> 2
>
>
> Length One
> (53.200000000000003, 49.200000000000003)
> ()
> ERROR: len() of unsized object
>
>
> Note that this isn't restricted to structured arrays.  For regular
> ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():

Exactly.  The last four lines of the function are:

   X = np.squeeze(X)
   if unpack:
       return X.T
   else:
       return X

>
> >>> a = np.ones((1, 1, 1))
> >>> np.squeeze(a)[0]
> IndexError: 0-d arrays can't be indexed
>
> >>> strData = StringIO("53.2")
> >>> a = np.loadtxt(strData)
> >>> a[0]
> IndexError: 0-d arrays can't be indexed
>
> So, if you have multiple lines with multiple columns, you get a 2-D
> array, as expected.
> if you have a single line of data with multiple columns, you get a 1-D
> array.
> If you have a single column with many lines, you also get a 1-D array
> (which is probably expected, I guess).
> If you have a single column with a single line, you get a scalar
> (actually, a 0-D array).
>
> Is this a bug or a feature?  I can see the advantages of having
> loadtxt() returning the lowest # of dimensions that can hold the given
> data, but it leaves the code vulnerable to certain edge cases.  Maybe
> there is a different way I should be doing this, but I feel that this
> behavior at the very least should be included in the loadtxt
> documentation.
>

It would be useful to be able to tell loadtxt to not call squeeze, so a
program that reads column-formatted data doesn't have to treat the case
of a single line specially.

Warren

I don't know if that is the best way to solve the problem.  In that case, you would always get a 2-D array, right?  Is that useful for those who have text data as a single column?  Maybe a mindim keyword (with None as default) and apply an appropriate "atleast_Nd()" call (or maybe have available an .atleast_nd() function?).  But, then what would this mean for structured arrays?  One might think that they want at least 2-D, but they really want at least 1-D.

Ben Root

P.S. - Taking this a step further, the functions completely fail in dealing with empty files...  In MATLAB, it returns an empty array (matrix?).

I am reviving this "dead" thread to note that I have filed ticket #1562 on the numpy Trac about this issue: http://projects.scipy.org/numpy/ticket/1562

Ben Root