New subject: data type specification when using numpy.genfromtxt

27 Jun 2011

      Hi Derek!

I tried with the lastest version of python(x,y) package with numpy version
of 1.6.0. I gave the data to you with reduced columns (10 column) and rows.

b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,usecols=tuple(range(10)),dtype=['S10']
+ [ float for n in range(9)]) works.
if you change  usecols=tuple(range(10))  to usecols=range(10), it still
works.

b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,dtype=None)
works.

but
b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,dtype=['S10']
+ [ float for n in range(9)]) didn't work.

I use Python(x,y)-2.6.6.1 with numpy version as 1.6.0, I use windows 32-bit
system.

Please don't spend too much time on this if it's not a potential problem.

the final thing is, when I try to do this (I want to try the missing_values
in numpy 1.6.0), it gives error:

In [33]: import StringIO as StringIO

In [34]: data = "1, 2, 3\n4, 5, 6"

In [35]: np.genfromtxt(StringIO(data),
delimiter=",",dtype="int,int,int",missing_values=2)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

D:\data\LaThuile_ancillary\Jim_Randerson_data\<ipython console> in
<module>()

TypeError: 'module' object is not callable

I think it must be some problem of my own python configuration?

Much thanks again,

cheers,

Chao

2011/6/27 Derek Homeier <derek@astro.physik.uni-goettingen.de>

> Hi Chao,
>
> this seems to have become quite a number of different issues!
> But let's make sure I understand what's going on...
>
> > Thanks very much for your quick reply. I make a short summary of what
> I've tried. Actually the ['S10'] + [ float for n in range(48) ] only works
> when you explicitly specify the columns to be read, and genfromtxt cannot
> automatically determine the type if you don't specify the type....
> >
>
> > In [164]:
> b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=tuple(range(49)),dtype=['S10']
> + [ float for n in range(48)])
> ...
> > But if I use the following, it gives error:
> >
> > In [171]:
> b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,dtype=['S
> > 10'] + [ float for n in range(48)])
> >
> ---------------------------------------------------------------------------
> > ValueError                                Traceback (most recent call
> last)
> >
> And the above (without the usecols) did work if you explicitly typed
> dtype=('S10', float, float....)? That by itself would be quite weird,
> because the two should be completely equivalent.
> What happens if you cast the generated list to a tuple -
> dtype=tuple(['S10'] + [ float for n in range(48)])?
> If you are using a recent numpy version (1.6.0 or 1.6.1rc1), could you
> please file a bug report with complete machine info etc.? But I suspect this
> might be an older version, you should also be able to simply use
> 'usecols=range(49)' (without the tuple()). Either way, I cannot reproduce
> this behaviour with the current numpy version.
>
> > If I don't specify the dtype, it will not recognize the type of the first
> column (it displays as nan):
> >
> > In [172]:
> b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=(0,1,2))
> >
> > In [173]: b
> > Out[173]:
> > array([(nan, -999.0, -1.028), (nan, -999.0, -0.40899999999999997),
> >        (nan, -999.0, 0.16700000000000001), ..., (nan, -999.0, -999.0),
> >        (nan, -999.0, -999.0), (nan, -999.0, -999.0)],
> >       dtype=[('TIMESTAMP', '<f8'), ('CO2_flux', '<f8'), ('Net_radiation',
> '<f8')
> > ])
> >
> You _do_ have to specify 'dtype=None', since the default is 'dtype=float',
> as I have remarked in my previous mail. If this does not work, it could be a
> matter of the numpy version gain - there were a number of type conversion
> issues fixed between 1.5.1 and 1.6.0.
> >
> > Then the final question is, actually the '-999.0' in the data is missing
> value, but I cannot display it as 'nan' by specifying the missing_values as
> '-999.0':
> > but either I set the missing_values as -999.0 or using a dictionary, it
> neither work...
> ...
> >
> > Even this doesn't work (suppose 2 is our missing_value),
> > In [184]: data = "1, 2, 3\n4, 5, 6"
> >
> > In [185]: np.genfromtxt(StringIO(data),
> delimiter=",",dtype="int,int,int",missin
> > g_values=2)
> > Out[185]:
> > array([(1, 2, 3), (4, 5, 6)],
> >       dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
>
> OK, same behaviour here - I found the only tests involving 'valid numbers'
> as missing_values use masked arrays; for regular ndarrays they seem to be
> ignored. I don't know if this is by design - the question is, what do you
> need to do with the data if you know ' -999' always means a missing value?
> You could certainly manipulate them after reading in...
> If you have to convert them already on reading in, and using np.mafromtxt
> is not an option, your best bet may be to define a custom converter like
> (note you have to include any blanks, if present)
>
> conv = dict(((n, lambda s: s==' -999' and np.nan or float(s)) for n in
> range(1,49)))
>
> Cheers,
>                                                 Derek
>
>

-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 77 30; Fax:01.69.08.77.16
************************************************************************************

Re: [Numpy-discussion] data type specification when using numpy.genfromtxt

Chao YUE

Derek Homeier

Chao YUE

tags

participants (2)