Hi, I am trying to use numpy.gentxt to import a csv for classification using scikit-learn. The first column in the csv is a string type class label while 200+ extra columns are integer features. Please I wish to find out how I can use the gentext function to specify a dtype of string for the first column while specifying int type for all other columns.
I have tried using "dtype=None" as shown below, but when I print dataset.shape, I get (number_or_rows,) i.e no columns are read in: dataset = np.genfromtxt(file,delimiter=',', skip_header=True)
I also tried setting the dtypes as shown in the examples below, but I get the same error as dtype=None: a: dataset = np.genfromtxt(file,delimiter=',', skip_header=True, dtype=['S2'] + [ int for n in range(241)],) b: dataset = np.genfromtxt(file,delimiter=',', skip_header=True, dtype=['S2'] + [ int for n in range(241)],names=True )
Any thoughts? Thanks for your assistance.
Dammy
-- View this message in context: http://numpy-discussion.10968.n7.nabble.com/Using-gentxt-to-import-a-csv-wit... Sent from the Numpy-discussion mailing list archive at Nabble.com.
Hi Dammy,
I really don't know how to test your issue, but you could try np.readtxt, or in the last case using pandas (read_csv) could do this for you.
Cheers, Arnaldo.
On Thu, May 7, 2015 at 2:26 AM, Dammy damilarefagbemi@gmail.com wrote:
Hi, I am trying to use numpy.gentxt to import a csv for classification using scikit-learn. The first column in the csv is a string type class label while 200+ extra columns are integer features. Please I wish to find out how I can use the gentext function to specify a dtype of string for the first column while specifying int type for all other columns.
I have tried using "dtype=None" as shown below, but when I print dataset.shape, I get (number_or_rows,) i.e no columns are read in: dataset = np.genfromtxt(file,delimiter=',', skip_header=True)
I also tried setting the dtypes as shown in the examples below, but I get the same error as dtype=None:
these dtypes will create structured arrays: http://docs.scipy.org/doc/numpy/user/basics.rec.html
so it is expected that the shape is the number of rows, the colums are part of the dtype and can be accessed like a dictionary:
In [21]: d = np.ones(3, dtype='S2, int8')
In [22]: d Out[22]: array([('1', 1), ('1', 1), ('1', 1)], dtype=[('f0', 'S2'), ('f1', 'i1')])
In [23]: d.shape Out[23]: (3,)
In [24]: d.dtype.names Out[24]: ('f0', 'f1')
In [25]: d[0] Out[25]: ('1', 1)
In [26]: d['f0'] Out[26]: array(['1', '1', '1'], dtype='|S2')
In [27]: d['f1'] Out[27]: array([1, 1, 1], dtype=int8)