Re: [Numpy-discussion] data type specification when using numpy.genfromtxt
27 Jun
2011
27 Jun
'11
11:30 p.m.
Hi Derek! I tried with the lastest version of python(x,y) package with numpy version of 1.6.0. I gave the data to you with reduced columns (10 column) and rows. b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,usecols=tuple(range(10)),dtype=['S10'] + [ float for n in range(9)]) works. if you change usecols=tuple(range(10)) to usecols=range(10), it still works. b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,dtype=None) works. but b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,dtype=['S10'] + [ float for n in range(9)]) didn't work. I use Python(x,y)-2.6.6.1 with numpy version as 1.6.0, I use windows 32-bit system. Please don't spend too much time on this if it's not a potential problem. the final thing is, when I try to do this (I want to try the missing_values in numpy 1.6.0), it gives error: In [33]: import StringIO as StringIO In [34]: data = "1, 2, 3\n4, 5, 6" In [35]: np.genfromtxt(StringIO(data), delimiter=",",dtype="int,int,int",missing_values=2) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) D:\data\LaThuile_ancillary\Jim_Randerson_data\in () TypeError: 'module' object is not callable I think it must be some problem of my own python configuration? Much thanks again, cheers, Chao 2011/6/27 Derek Homeier > Hi Chao, > > this seems to have become quite a number of different issues! > But let's make sure I understand what's going on... > > > Thanks very much for your quick reply. I make a short summary of what > I've tried. Actually the ['S10'] + [ float for n in range(48) ] only works > when you explicitly specify the columns to be read, and genfromtxt cannot > automatically determine the type if you don't specify the type.... > > > > > In [164]: > b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=tuple(range(49)),dtype=['S10'] > + [ float for n in range(48)]) > ... > > But if I use the following, it gives error: > > > > In [171]: > b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,dtype=['S > > 10'] + [ float for n in range(48)]) > > > --------------------------------------------------------------------------- > > ValueError Traceback (most recent call > last) > > > And the above (without the usecols) did work if you explicitly typed > dtype=('S10', float, float....)? That by itself would be quite weird, > because the two should be completely equivalent. > What happens if you cast the generated list to a tuple - > dtype=tuple(['S10'] + [ float for n in range(48)])? > If you are using a recent numpy version (1.6.0 or 1.6.1rc1), could you > please file a bug report with complete machine info etc.? But I suspect this > might be an older version, you should also be able to simply use > 'usecols=range(49)' (without the tuple()). Either way, I cannot reproduce > this behaviour with the current numpy version. > > > If I don't specify the dtype, it will not recognize the type of the first > column (it displays as nan): > > > > In [172]: > b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=(0,1,2)) > > > > In [173]: b > > Out[173]: > > array([(nan, -999.0, -1.028), (nan, -999.0, -0.40899999999999997), > > (nan, -999.0, 0.16700000000000001), ..., (nan, -999.0, -999.0), > > (nan, -999.0, -999.0), (nan, -999.0, -999.0)], > > dtype=[('TIMESTAMP', ' ' > ]) > > > You _do_ have to specify 'dtype=None', since the default is 'dtype=float', > as I have remarked in my previous mail. If this does not work, it could be a > matter of the numpy version gain - there were a number of type conversion > issues fixed between 1.5.1 and 1.6.0. > > > > Then the final question is, actually the '-999.0' in the data is missing > value, but I cannot display it as 'nan' by specifying the missing_values as > '-999.0': > > but either I set the missing_values as -999.0 or using a dictionary, it > neither work... > ... > > > > Even this doesn't work (suppose 2 is our missing_value), > > In [184]: data = "1, 2, 3\n4, 5, 6" > > > > In [185]: np.genfromtxt(StringIO(data), > delimiter=",",dtype="int,int,int",missin > > g_values=2) > > Out[185]: > > array([(1, 2, 3), (4, 5, 6)], > > dtype=[('f0', ' > OK, same behaviour here - I found the only tests involving 'valid numbers' > as missing_values use masked arrays; for regular ndarrays they seem to be > ignored. I don't know if this is by design - the question is, what do you > need to do with the data if you know ' -999' always means a missing value? > You could certainly manipulate them after reading in... > If you have to convert them already on reading in, and using np.mafromtxt > is not an option, your best bet may be to define a custom converter like > (note you have to include any blanks, if present) > > conv = dict(((n, lambda s: s==' -999' and np.nan or float(s)) for n in > range(1,49))) > > Cheers, > Derek > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 77 30; Fax:01.69.08.77.16 ************************************************************************************
4678
Age (days ago)
4679
Last active (days ago)
2 comments
2 participants
participants (2)
-
Chao YUE
-
Derek Homeier