[Numpy-discussion] More loadtxt() changes
Ryan May
rmay31 at gmail.com
Tue Nov 25 14:06:30 EST 2008
Pierre GM wrote:
> Ryan,
> FYI, I've been coding over the last couple of weeks an extension of
> loadtxt for a better support of masked data, with the option to read
> column names in a header. Please find an example below (I also have
> unittest). Most of the work is actually inspired from matplotlib's
> mlab.csv2rec. It might be worth not duplicating efforts.
> Cheers,
> P.
Absolutely! Definitely don't want to duplicate effort here. What I see
here meets a lot of what I was looking for. Here are some questions:
1) It looks like the function returns a structured array rather than a
rec array, so that fields are obtained by doing a dictionary access.
Since it's a dictionary access, is there any reason that the header
needs to be munged to replace characters and reserved names? IIUC,
csv2rec changes names b/c it returns a rec array, which uses attribute
lookup and hence all names need to be valid python identifiers. This is
not the case for a structured array.
2) Can we avoid the use of seek() in here? I just posted a patch to
change the check to readline, which was the only file function used
previously. This allowed the direct use of a file-like object returned
by urllib2.urlopen().
3) In order to avoid breaking backwards compatibility, can we change to
default for dtype to be float32, and instead use some kind of special
value ('auto' ?) to use the automatic dtype determination?
I'm currently cooking up some of these changes myself, but thought I
would see what you thought first.
Ryan
--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
More information about the NumPy-Discussion
mailing list