[Numpy-discussion] load from text files Pull Request Review
Chris.Barker
Chris.Barker at noaa.gov
Fri Sep 2 11:50:54 EDT 2011
On 9/2/11 8:22 AM, Derek Homeier wrote:
> I agree it would make a very nice addition, and could complement my
> pre-allocation option for loadtxt - however there I've also been made
> aware that this approach breaks streamed input etc., so the buffer.resize(…)
> methods in accumulator would be the better way to go.
Good point, that would be nice.
> For load table this is not quite as straightforward, though, because the type
> auto-detection, strictly done, requires to scan the entire input, because a
> column full of int could still produce a float in the last row…
hmmm -- it seems you could jsut as well be building the array as you go,
and if you hit a change in the imput, re-set and start again.
In my tests, I'm pretty sure that the time spent file io and string
parsing swamp the time it takes to allocate memory and set the values.
So there is little cost, and for the common use case, it would be faster
and cleaner.
There is a chance, of course, that you might have to re-wind and start
over more than once, but I suspect that that is the rare case.
> For better consistency with what people have likely got used to from npyio,
> I'd recommend some minor changes:
>
> make spaces the default delimiter
+1
> enable automatic decompression (given the modularity, could you simply
> use np.lib._datasource.open() like genfromtxt?)
I _think_this would benefit from a one-pass solution as well -- so you
don't need to de-compress twice.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list