[Numpy-discussion] loadtxt() behavior on single-line files

Warren Weckesser warren.weckesser at enthought.com
Thu Jun 24 14:00:54 EDT 2010


Benjamin Root wrote:
> Hi,
>
> I was having the hardest time trying to figure out an intermittent bug 
> in one of my programs.  Essentially, in some situations, it was 
> throwing an error saying that the array object was not an array.  It 
> took me a while, but then I figured out that my program was assuming 
> that the object returned from a loadtxt() call was always a structured 
> array (I was using dtypes).  However, if the data file being loaded 
> only had one data record, then all you get back is a structured record.
>
> import numpy as np
> from StringIO import StringIO
>
> strData = StringIO("89.23 47.2\n13.2 42.2")
> a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
> print "Length Two"
> print a
> print a.shape
> print len(a)
>
> strData = StringIO("53.2 49.2")
> a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
> print "\n\nLength One"
> print a
> print a.shape
> try :
>     print len(a)
> except TypeError as err
>     print "ERROR:", err
>
> Which gets me this output:
>
> Length Two
> [(89.230000000000004, 47.200000000000003)
>  (13.199999999999999, 42.200000000000003)]
> (2,)
> 2
>
>
> Length One
> (53.200000000000003, 49.200000000000003)
> ()
> ERROR: len() of unsized object
>
>
> Note that this isn't restricted to structured arrays.  For regular 
> ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():

Exactly.  The last four lines of the function are:

    X = np.squeeze(X)
    if unpack:
        return X.T
    else:
        return X

>
> >>> a = np.ones((1, 1, 1))
> >>> np.squeeze(a)[0]
> IndexError: 0-d arrays can't be indexed
>
> >>> strData = StringIO("53.2")
> >>> a = np.loadtxt(strData)
> >>> a[0]
> IndexError: 0-d arrays can't be indexed
>
> So, if you have multiple lines with multiple columns, you get a 2-D 
> array, as expected.
> if you have a single line of data with multiple columns, you get a 1-D 
> array.
> If you have a single column with many lines, you also get a 1-D array 
> (which is probably expected, I guess).
> If you have a single column with a single line, you get a scalar 
> (actually, a 0-D array).
>
> Is this a bug or a feature?  I can see the advantages of having 
> loadtxt() returning the lowest # of dimensions that can hold the given 
> data, but it leaves the code vulnerable to certain edge cases.  Maybe 
> there is a different way I should be doing this, but I feel that this 
> behavior at the very least should be included in the loadtxt 
> documentation.
>

It would be useful to be able to tell loadtxt to not call squeeze, so a 
program that reads column-formatted data doesn't have to treat the case 
of a single line specially.

Warren





More information about the NumPy-Discussion mailing list