Re: [Numpy-discussion] More loadtxt() changes

25 Nov 2008

      On Nov 25, 2008, at 2:06 PM, Ryan May wrote:
...
1) It looks like the function returns a structured array rather than a
rec array, so that fields are obtained by doing a dictionary access.
Since it's a dictionary access, is there any reason that the header
needs to be munged to replace characters and reserved names?  IIUC,
csv2rec changes names b/c it returns a rec array, which uses attribute
lookup and hence all names need to be valid python identifiers.   
This is
not the case for a structured array.
Personally, I prefer flexible ndarrays to recarrays, hence the output.  
However, I still think that names should be as clean as possible to  
avoid bad surprises down the road.
...
2) Can we avoid the use of seek() in here?  I just posted a patch to
change the check to readline, which was the only file function used
previously.  This allowed the direct use of a file-like object  
returned
by urllib2.urlopen().
I coded that a couple of weeks ago, before you posted your patch and I  
didn't have tme to check it. Yes, we could try getting rid of seek.  
However, we need to find a way to rewind to the beginning of the file  
if the dtypes are not given in input (as we parsed the whole file to  
find the best converter in that case).
...
3) In order to avoid breaking backwards compatibility, can we change  
to
default for dtype to be float32, and instead use some kind of special
value ('auto' ?) to use the automatic dtype determination?
I'm not especially concerned w/ backwards compatibility, because we're  
supporting masked values (something that np.loadtxt shouldn't have to  
worry about). Initially, I needed a replacement to the fromfile  
function in the scikits.timeseries.trecords package. I figured it'd be  
easier and more portable to get a function for generic masked arrays,  
that could be adapted afterwards to timeseries. In any case, I was  
more considering the functions I send you to be part of some  
numpy.ma.io module than a replacement to np.loadtxt. I tried to get  
the syntax as close as possible to np.loadtxt and mlab.csv2rec, but  
there'll always be some differences.

So, yes, we could try to use a default dtype=float and yes, we could  
have an extra parameter 'auto'. But is it really that useful ? I'm not  
sure (well, no, I'm sure it's not...)
...
I'm currently cooking up some of these changes myself, but thought I
would see what you thought first.

Re: [Numpy-discussion] More loadtxt() changes

Pierre GM