Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

Dec. 1, 2008

      I agree, genloadtxt is a bit blotted, and it's not a surprise it's  
slower than the initial one. I think that in order to be fair,  
comparisons must be performed with matplotlib.mlab.csv2rec, that  
implements as well the autodetection of the dtype. I'm quite in favor  
of keeping a lite version around.

On Dec 1, 2008, at 4:47 PM, Stéfan van der Walt wrote:
...
...
I haven't investigated the code in too much detail, but wouldn't it be
possible to implement the current set of functionality in a
base-class, which is then specialised to add the rest?  That way, one
could always instantiate TextReader yourself for some added speed.
Well, one of the issues is that we need to keep the function  
compatible w/ urllib.urlretrieve (Ryan, am I right?), which means not  
being able to go back to the beginning of a file (no call to .seek).  
Another issue comes from the possibility to define the dtype  
automatically: you need to keep track of the converters, then have to  
do a second loop on the data. Those converters are likely the  
bottleneck, as you need to check whether each value can be interpreted  
as missing or not and respond appropriately.

I thought about creating a base class, with a specific subclass taking  
care of the missing values. I found out it would have duplicated a lot  
of code

In any case, I think that's secondary: we can always optimize pieces  
of the code afterwards. I'd like more feedback on corner cases and  
usage...

Re: [Numpy-discussion] np.loadtxt : yet a new implementation...

Pierre GM