[Numpy-discussion] load from text files Pull Request Review

Fri Sep 2 11:50:54 EDT 2011

On 9/2/11 8:22 AM, Derek Homeier wrote:
> I agree it would make a very nice addition, and could complement my
> pre-allocation option for loadtxt - however there I've also been made
> aware that this approach breaks streamed input etc., so the buffer.resize(…)
> methods in accumulator would be the better way to go.

Good point, that would be nice.

> For load table this is not quite as straightforward, though, because the type
> auto-detection, strictly done, requires to scan the entire input, because a
> column full of int could still produce a float in the last row…

hmmm -- it seems you could jsut as well be building the array as you go, 
and if you hit a change in the imput, re-set and start again.

In my tests, I'm pretty sure that the time spent file io and string 
parsing swamp the time it takes to allocate memory and set the values.

So there is little cost, and for the common use case, it would be faster 
and cleaner.

There is a chance, of course, that you might have to re-wind and start 
over more than once, but I suspect that that is the rare case.

> For better consistency with what people have likely got used to from npyio,
> I'd recommend some minor changes:
>
> make spaces the default delimiter

+1

> enable automatic decompression (given the modularity, could you simply
> use np.lib._datasource.open() like genfromtxt?)

I _think_this would benefit from a one-pass solution as well -- so you 
don't need to de-compress twice.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov