numpy.loadtext() fails with dtype + usecols
Hi, I was trying to use loadtxt() today to read in some text data, and I had a problem when I specified a dtype that only contained as many elements as in columns in usecols. The example below shows the problem: import numpy as np import StringIO data = '''STID RELH TAIR JOE 70.1 25.3 BOB 60.5 27.9 ''' f = StringIO.StringIO(data) names = ['stid', 'temp'] dtypes = ['S4', 'f8'] arr = np.loadtxt(f, usecols=(0,2),dtype=zip(names,dtypes), skiprows=1) With current 1.1 (and SVN head), this yields: IndexError Traceback (most recent call last) /home/rmay/<ipython console> in <module>() /usr/lib64/python2.5/site-packages/numpy/lib/io.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack) 309 for j in xrange(len(vals))] 310 if usecols is not None: --> 311 row = [converterseq[j](vals[j]) for j in usecols] 312 else: 313 row = [converterseq[j](val) for j,val in enumerate(vals)] IndexError: list index out of range ------------------------------------------ I've added a patch that checks for usecols, and if present, correctly creates the converters dictionary to map each specified column with converter for the corresponding field in the dtype. With the attached patch, this works fine:
arr array([('JOE', 25.300000000000001), ('BOB', 27.899999999999999)], dtype=[('stid', '|S4'), ('temp', '<f8')])
Comments? Can I get this in for 1.1.1? Thanks, Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
On Fri, Jul 18, 2008 at 4:16 PM, Ryan May <rmay31@gmail.com> wrote:
Hi,
I was trying to use loadtxt() today to read in some text data, and I had a problem when I specified a dtype that only contained as many elements as in columns in usecols. The example below shows the problem:
import numpy as np import StringIO data = '''STID RELH TAIR JOE 70.1 25.3 BOB 60.5 27.9 ''' f = StringIO.StringIO(data) names = ['stid', 'temp'] dtypes = ['S4', 'f8'] arr = np.loadtxt(f, usecols=(0,2),dtype=zip(names,dtypes), skiprows=1)
With current 1.1 (and SVN head), this yields:
IndexError Traceback (most recent call last)
/home/rmay/<ipython console> in <module>()
/usr/lib64/python2.5/site-packages/numpy/lib/io.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack) 309 for j in xrange(len(vals))] 310 if usecols is not None: --> 311 row = [converterseq[j](vals[j]) for j in usecols] 312 else: 313 row = [converterseq[j](val) for j,val in enumerate(vals)]
IndexError: list index out of range ------------------------------------------
I've added a patch that checks for usecols, and if present, correctly creates the converters dictionary to map each specified column with converter for the corresponding field in the dtype. With the attached patch, this works fine:
arr array([('JOE', 25.300000000000001), ('BOB', 27.899999999999999)], dtype=[('stid', '|S4'), ('temp', '<f8')])
Comments? Can I get this in for 1.1.1?
Can someone familiar with loadtxt comment on this patch? Chuck
Looks good to me. I committed the patch to the trunk and added a regression test (r5495). David 2008/7/18 Charles R Harris <charlesr.harris@gmail.com>:
On Fri, Jul 18, 2008 at 4:16 PM, Ryan May <rmay31@gmail.com> wrote:
Hi,
I was trying to use loadtxt() today to read in some text data, and I had a problem when I specified a dtype that only contained as many elements as in columns in usecols. The example below shows the problem:
import numpy as np import StringIO data = '''STID RELH TAIR JOE 70.1 25.3 BOB 60.5 27.9 ''' f = StringIO.StringIO(data) names = ['stid', 'temp'] dtypes = ['S4', 'f8'] arr = np.loadtxt(f, usecols=(0,2),dtype=zip(names,dtypes), skiprows=1)
With current 1.1 (and SVN head), this yields:
IndexError Traceback (most recent call last)
/home/rmay/<ipython console> in <module>()
/usr/lib64/python2.5/site-packages/numpy/lib/io.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack) 309 for j in xrange(len(vals))] 310 if usecols is not None: --> 311 row = [converterseq[j](vals[j]) for j in usecols] 312 else: 313 row = [converterseq[j](val) for j,val in enumerate(vals)]
IndexError: list index out of range ------------------------------------------
I've added a patch that checks for usecols, and if present, correctly creates the converters dictionary to map each specified column with converter for the corresponding field in the dtype. With the attached patch, this works fine:
arr array([('JOE', 25.300000000000001), ('BOB', 27.899999999999999)], dtype=[('stid', '|S4'), ('temp', '<f8')])
Comments? Can I get this in for 1.1.1?
Can someone familiar with loadtxt comment on this patch?
Chuck
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
participants (3)
-
Charles R Harris
-
David Huard
-
Ryan May