[Numpy-discussion] More loadtxt() changes

Tue Nov 25 14:06:30 EST 2008

Pierre GM wrote:
> Ryan,
> FYI,  I've been coding over the last couple of weeks an extension of 
> loadtxt for a better support of masked data, with the option to read 
> column names in a header. Please find an example below (I also have 
> unittest). Most of the work is actually inspired from matplotlib's 
> mlab.csv2rec. It might be worth not duplicating efforts.
> Cheers,
> P.

Absolutely!  Definitely don't want to duplicate effort here.  What I see 
here meets a lot of what I was looking for.  Here are some questions:

1) It looks like the function returns a structured array rather than a 
rec array, so that fields are obtained by doing a dictionary access. 
Since it's a dictionary access, is there any reason that the header 
needs to be munged to replace characters and reserved names?  IIUC, 
csv2rec changes names b/c it returns a rec array, which uses attribute 
lookup and hence all names need to be valid python identifiers.  This is 
not the case for a structured array.

2) Can we avoid the use of seek() in here?  I just posted a patch to 
change the check to readline, which was the only file function used 
previously.  This allowed the direct use of a file-like object returned 
by urllib2.urlopen().

3) In order to avoid breaking backwards compatibility, can we change to 
default for dtype to be float32, and instead use some kind of special 
value ('auto' ?) to use the automatic dtype determination?

I'm currently cooking up some of these changes myself, but thought I 
would see what you thought first.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma