
On 9/2/11 8:22 AM, Derek Homeier wrote:
I agree it would make a very nice addition, and could complement my pre-allocation option for loadtxt - however there I've also been made aware that this approach breaks streamed input etc., so the buffer.resize(…) methods in accumulator would be the better way to go.
Good point, that would be nice.
For load table this is not quite as straightforward, though, because the type auto-detection, strictly done, requires to scan the entire input, because a column full of int could still produce a float in the last row…
hmmm -- it seems you could jsut as well be building the array as you go, and if you hit a change in the imput, re-set and start again. In my tests, I'm pretty sure that the time spent file io and string parsing swamp the time it takes to allocate memory and set the values. So there is little cost, and for the common use case, it would be faster and cleaner. There is a chance, of course, that you might have to re-wind and start over more than once, but I suspect that that is the rare case.
For better consistency with what people have likely got used to from npyio, I'd recommend some minor changes:
make spaces the default delimiter
+1
enable automatic decompression (given the modularity, could you simply use np.lib._datasource.open() like genfromtxt?)
I _think_this would benefit from a one-pass solution as well -- so you don't need to de-compress twice. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov