[Numpy-discussion] Using gentxt to import a csv with a string class label and hundreds of integer features

Wed May 6 20:26:02 EDT 2015

Hi,
I am trying to use numpy.gentxt to import a csv for classification using
scikit-learn. The first column in the csv is a string type class label while
200+ extra columns are integer features.
Please I wish to find out how I can use the gentext function to specify a
dtype of string for the first column while specifying int type for all other
columns.

I have tried using "dtype=None" as shown below, but when I print
dataset.shape,  I get (number_or_rows,) i.e no columns are read in:
 dataset = np.genfromtxt(file,delimiter=',', skip_header=True)

I also tried setting the dtypes as shown in the examples below, but I get
the same error as dtype=None:
a: dataset = np.genfromtxt(file,delimiter=',', skip_header=True,
dtype=['S2'] + [ int for n in range(241)],)
b: dataset = np.genfromtxt(file,delimiter=',', skip_header=True,
dtype=['S2'] + [ int for n in range(241)],names=True )

Any thoughts? Thanks for your assistance.

Dammy

--
View this message in context: http://numpy-discussion.10968.n7.nabble.com/Using-gentxt-to-import-a-csv-with-a-string-class-label-and-hundreds-of-integer-features-tp40319.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.