Pierre GM wrote:
Ryan, FYI, I've been coding over the last couple of weeks an extension of loadtxt for a better support of masked data, with the option to read column names in a header. Please find an example below (I also have unittest). Most of the work is actually inspired from matplotlib's mlab.csv2rec. It might be worth not duplicating efforts. Cheers, P.
Absolutely! Definitely don't want to duplicate effort here. What I see here meets a lot of what I was looking for. Here are some questions: 1) It looks like the function returns a structured array rather than a rec array, so that fields are obtained by doing a dictionary access. Since it's a dictionary access, is there any reason that the header needs to be munged to replace characters and reserved names? IIUC, csv2rec changes names b/c it returns a rec array, which uses attribute lookup and hence all names need to be valid python identifiers. This is not the case for a structured array. 2) Can we avoid the use of seek() in here? I just posted a patch to change the check to readline, which was the only file function used previously. This allowed the direct use of a file-like object returned by urllib2.urlopen(). 3) In order to avoid breaking backwards compatibility, can we change to default for dtype to be float32, and instead use some kind of special value ('auto' ?) to use the automatic dtype determination? I'm currently cooking up some of these changes myself, but thought I would see what you thought first. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma