Re: [Numpy-discussion] data type specification when using numpy.genfromtxt

28 Jun 2011

      Thanks very much!! you are right.  It's becuase the extra semicolon in the
head row. I have no problems anymore.

I thank you for your time.

cheeers,

Chao

2011/6/28 Derek Homeier <derek@astro.physik.uni-goettingen.de>

> Hi Chao,
>
> by mistake did not reply to the list last time...
>
> On 27.06.2011, at 10:30PM, Chao YUE wrote:
> Hi Derek!
> >
> > I tried with the lastest version of python(x,y) package with numpy
> version of 1.6.0. I gave the data to you with reduced columns (10 column)
> and rows.
> >
> >
> b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,usecols=tuple(range(10)),dtype=['S10']
> + [ float for n in range(9)]) works.
> > if you change  usecols=tuple(range(10))  to usecols=range(10), it still
> works.
> >
> >
> b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,dtype=None)
> works.
> >
> > but
> b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,dtype=['S10']
> + [ float for n in range(9)]) didn't work.
> >
> > I use Python(x,y)-2.6.6.1 with numpy version as 1.6.0, I use windows
> 32-bit system.
> >
> > Please don't spend too much time on this if it's not a potential problem.
> >
> OK, dtype=None works on 1.6.0, that's the important bit.
> >From your example file it seems the dtype list does work not without
> specifying usecols, because your header contains and excess semicolon in the
> field "Air temperature (High; HMP45C)", thus genfromtxt expects more data
> columns than actually exist. If you replace the semicolon you should be set
> (or, if I may suggest, write another header line with catchier field names
> so you don't have to work with array fields like "b['Water vapor density by
> LiCor 7500']"  ;-).
> Otherwise both options work for me with python2.6+numpy-1.5.1 as well as
> 1.6.0/1.6.1rc1.
>
> I am curious though why your python interpreter gave this error message:
> > ValueError                                Traceback (most recent call
> last)
> >
> > D:\data\LaThuile_ancillary\Jim_Randerson_data\<ipython console> in
> <module>()
> >
> > C:\Python26\lib\site-packages\numpy\lib\npyio.pyc in genfromtxt(fname,
> dtype, co
> > mments, delimiter, skiprows, skip_header, skip_footer, converters,
> missing, miss
> > ing_values, filling_values, usecols, names, excludelist, deletechars,
> replace_sp
> > ace, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose,
> invalid_rais
> > e)
> >    1449             # Raise an exception ?
> >
> >    1450             if invalid_raise:
> > -> 1451                 raise ValueError(errmsg)
> >    1452             # Issue a warning ?
> >
> >    1453             else:
> >
> > ValueError
>
> since ipython2.6 on my Mac reported this:
> ...
>    1450             if invalid_raise:
> -> 1451                 raise ValueError(errmsg)
>   1452             # Issue a warning ?
>
>   1453             else:
>
> ValueError: Some errors were detected !
>    Line #3 (got 10 columns instead of 11)
>    Line #4 (got 10 columns instead of 11)
> etc....
> which of course provided the right lead to the problem - was the actual
> errmsg really missing, or did you cut the message too soon?
>
> > the final thing is, when I try to do this (I want to try the
> missing_values in numpy 1.6.0), it gives error:
> >
> > In [33]: import StringIO as StringIO
> >
> > In [34]: data = "1, 2, 3\n4, 5, 6"
> >
> > In [35]: np.genfromtxt(StringIO(data),
> delimiter=",",dtype="int,int,int",missing_values=2)
> >
> ---------------------------------------------------------------------------
> > TypeError                                 Traceback (most recent call
> last)
> >
> > D:\data\LaThuile_ancillary\Jim_Randerson_data\<ipython console> in
> <module>()
> >
> > TypeError: 'module' object is not callable
> >
> You want to use "from StringIO import StringIO" (or write
> "StringIO.StringIO(data)".
> But again, this will not work the way you expect it to with int/float
> numbers set as missing_values, and reading to regular arrays. I've tested
> this on 1.6.1 and the current development branch as well, and the
> missing_values are only considered for masked arrays. This is not likely to
> change soon, and may actually be intentional, so to process those numbers on
> read-in, your best option would be to define a custom set of
> "converters=conv" as shown in my last mail.
>
> Cheers,
>                                                        Derek
>
> > 2011/6/27 Derek Homeier <derek@astro.physik.uni-goettingen.de>
> > Hi Chao,
> >
> > this seems to have become quite a number of different issues!
> > But let's make sure I understand what's going on...
> >
> > > Thanks very much for your quick reply. I make a short summary of what
> I've tried. Actually the ['S10'] + [ float for n in range(48) ] only works
> when you explicitly specify the columns to be read, and genfromtxt cannot
> automatically determine the type if you don't specify the type....
> > >
> >
> > > In [164]:
> b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=tuple(range(49)),dtype=['S10']
> + [ float for n in range(48)])
> > ...
> > > But if I use the following, it gives error:
> > >
> > > In [171]:
> b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,dtype=['S
> > > 10'] + [ float for n in range(48)])
> > >
> ---------------------------------------------------------------------------
> > > ValueError                                Traceback (most recent call
> last)
> > >
> > And the above (without the usecols) did work if you explicitly typed
> dtype=('S10', float, float....)? That by itself would be quite weird,
> because the two should be completely equivalent.
> > What happens if you cast the generated list to a tuple -
> dtype=tuple(['S10'] + [ float for n in range(48)])?
> > If you are using a recent numpy version (1.6.0 or 1.6.1rc1), could you
> please file a bug report with complete machine info etc.? But I suspect this
> might be an older version, you should also be able to simply use
> 'usecols=range(49)' (without the tuple()). Either way, I cannot reproduce
> this behaviour with the current numpy version.
> >
> > > If I don't specify the dtype, it will not recognize the type of the
> first column (it displays as nan):
> > >
> > > In [172]:
> b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=(0,1,2))
> > >
> > > In [173]: b
> > > Out[173]:
> > > array([(nan, -999.0, -1.028), (nan, -999.0, -0.40899999999999997),
> > >        (nan, -999.0, 0.16700000000000001), ..., (nan, -999.0, -999.0),
> > >        (nan, -999.0, -999.0), (nan, -999.0, -999.0)],
> > >       dtype=[('TIMESTAMP', '<f8'), ('CO2_flux', '<f8'),
> ('Net_radiation', '<f8')
> > > ])
> > >
> > You _do_ have to specify 'dtype=None', since the default is
> 'dtype=float', as I have remarked in my previous mail. If this does not
> work, it could be a matter of the numpy version gain - there were a number
> of type conversion issues fixed between 1.5.1 and 1.6.0.
> > >
> > > Then the final question is, actually the '-999.0' in the data is
> missing value, but I cannot display it as 'nan' by specifying the
> missing_values as '-999.0':
> > > but either I set the missing_values as -999.0 or using a dictionary, it
> neither work...
> > ...
> > >
> > > Even this doesn't work (suppose 2 is our missing_value),
> > > In [184]: data = "1, 2, 3\n4, 5, 6"
> > >
> > > In [185]: np.genfromtxt(StringIO(data),
> delimiter=",",dtype="int,int,int",missin
> > > g_values=2)
> > > Out[185]:
> > > array([(1, 2, 3), (4, 5, 6)],
> > >       dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
> >
> > OK, same behaviour here - I found the only tests involving 'valid
> numbers' as missing_values use masked arrays; for regular ndarrays they seem
> to be ignored. I don't know if this is by design - the question is, what do
> you need to do with the data if you know ' -999' always means a missing
> value? You could certainly manipulate them after reading in...
> > If you have to convert them already on reading in, and using np.mafromtxt
> is not an option, your best bet may be to define a custom converter like
> (note you have to include any blanks, if present)
> >
> > conv = dict(((n, lambda s: s==' -999' and np.nan or float(s)) for n in
> range(1,49)))
> >
> > Cheers,
> >                                                Derek
> >
> >
> >
> >
> > --
> >
> ***********************************************************************************
> > Chao YUE
> > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
> > UMR 1572 CEA-CNRS-UVSQ
> > Batiment 712 - Pe 119
> > 91191 GIF Sur YVETTE Cedex
> > Tel: (33) 01 69 08 77 30; Fax:01.69.08.77.16
> >
> ************************************************************************************
> >
> > <99Burn2003all_new.csv>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 77 30; Fax:01.69.08.77.16
************************************************************************************