data type specification when using numpy.genfromtxt
Dear all numpy users, I want to read a csv file with many (49) columns, the first column is string and remaning can be float. how can I avoid type in like data=numpy.genfromtxt('data.csv',delimiter=';',names=True, dtype=(S10, float, float, ......)) Can I just specify the type of first cloumn is tring and the remaing float? how can I do that? Thanks a lot, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 77 30; Fax:01.69.08.77.16 ************************************************************************************
On 26.06.2011, at 8:48PM, Chao YUE wrote:
I want to read a csv file with many (49) columns, the first column is string and remaning can be float. how can I avoid type in like
data=numpy.genfromtxt('data.csv',delimiter=';',names=True, dtype=(S10, float, float, ......))
Can I just specify the type of first cloumn is tring and the remaing float? how can I do that?
Simply use 'dtype=None' to let genfromtxt automatically determine the type (it is perhaps a bit confusing that this is not the default - maybe it should be repeated in the docstring for clarity that the default is for dtype is 'float'...). Also, a shorter way of typing the dtype above (e.g. in case some columns would be auto-detected as int) would be ['S10'] + [ float for n in range(48) ] HTH, Derek
On Sun, Jun 26, 2011 at 2:27 PM, Derek Homeier < derek@astro.physik.uni-goettingen.de> wrote:
On 26.06.2011, at 8:48PM, Chao YUE wrote:
I want to read a csv file with many (49) columns, the first column is string and remaning can be float. how can I avoid type in like
data=numpy.genfromtxt('data.csv',delimiter=';',names=True, dtype=(S10, float, float, ......))
Can I just specify the type of first cloumn is tring and the remaing float? how can I do that?
Simply use 'dtype=None' to let genfromtxt automatically determine the type (it is perhaps a bit confusing that this is not the default - maybe it should be repeated in the docstring for clarity that the default is for dtype is 'float'...). Also, a shorter way of typing the dtype above (e.g. in case some columns would be auto-detected as int) would be ['S10'] + [ float for n in range(48) ]
Another possibility is -- if you don't want the first column string data -- is to basically ignore that column by specifying the "usecol" parameter. Ben Root
*Hi Derek,
Thanks very much for your quick reply. I make a short summary of what I've
tried. Actually the *['S10'] + [ float for n in range(48) ] *only* *works
when you explicitly specify the columns to be read, and genfromtxt cannot
automatically determine the type* *if you don't specify the type....
I also have a problem with the missing value which I described at the end of
this mail. Sorry for the very long example....
Thanks again,
*
In [164]:
b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=tuple(range(49)),dtype=['S10']
+ [ float for n in range(48)])
In [165]: b
Out[165]:
array([ ('01/01/2003', -999.0, -1.028, -999.0, -999.0, -999.0, -999.0,
-999.0, -
25.368400000000001, 0.75920799999999999, -25.425699999999999,
0.7763219999999999
6, -25.220500000000001, 0.77561899999999995, 0.20000000000000001, 280.089,
0.574
58299999999995, 0.417018, -0.042441800000000002, 0.0428254,
-0.18517600000000001
, -0.056775800000000001, 93.721299999999999, -8.1318099999999998, -9.5244,
-9.93
23200000000007, -10.2728, -20.945499999999999, -8.4939999999999998,
-9.567819999
9999993, -9.9175500000000003, -9.7835400000000003, -10.4445, -999.0, -999.0,
-99
9.0, -999.0, -999.0, -2.80863, -6.7711100000000002, -999.0, -999.0, -999.0,
0.10
9, 0.075999999999999998, 0.10000000000000001, 0.074999999999999997, 0.0,
-999.0),
('01/01/2003', -999.0, -0.40899999999999997, -999.0, -999.0, -999.0,
-999
.0, -999.0, -25.3233, 0.75929800000000003, -25.368600000000001,
0.77451599999999
998, -25.118400000000001, 0.77264200000000005, 0.20499999999999999,
267.80599999
999998, 0.59291700000000003, 0.42051699999999997, -0.037141399999999998,
0.04043
3200000000002, -0.16375999999999999, -0.029456400000000001,
93.749099999999999,
-8.1292799999999996, -9.5213800000000006, -9.9336199999999995,
-10.2749000000000
01, -21.1402, -8.4918899999999997, -9.5663699999999992, -9.9207000000000001,
-9.
7896099999999997, -10.4514, -999.0, -999.0, -999.0, -999.0, -999.0, -2.8468,
-6.
7986899999999997, -999.0, -999.0, -999.0, 0.109, 0.075999999999999998,
0.1000000
0000000001, 0.074999999999999997, 0.0, -999.0),
....
dtype=[('TIMESTAMP', '|S10'), ('CO2_flux', '
On 26.06.2011, at 8:48PM, Chao YUE wrote:
I want to read a csv file with many (49) columns, the first column is string and remaning can be float. how can I avoid type in like
data=numpy.genfromtxt('data.csv',delimiter=';',names=True, dtype=(S10, float, float, ......))
Can I just specify the type of first cloumn is tring and the remaing float? how can I do that?
Simply use 'dtype=None' to let genfromtxt automatically determine the type (it is perhaps a bit confusing that this is not the default - maybe it should be repeated in the docstring for clarity that the default is for dtype is 'float'...). Also, a shorter way of typing the dtype above (e.g. in case some columns would be auto-detected as int) would be ['S10'] + [ float for n in range(48) ]
HTH, Derek
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 77 30; Fax:01.69.08.77.16 ************************************************************************************
participants (3)
-
Benjamin Root
-
Chao YUE
-
Derek Homeier