[Numpy-discussion] load from text files Pull Request Review

Derek Homeier derek at astro.physik.uni-goettingen.de
Fri Sep 2 11:22:19 EDT 2011

On 30.08.2011, at 6:21PM, Chris.Barker wrote:

>> I've submitted a pull request for a new method for loading data from
>> text files into a record array/masked record array.
>> Click on the link for more info, but the general idea is to create a
>> regular expression for what entries should look like and loop over the
>> file, updating the regular expression if it's wrong. Once the types
>> are determined the file is loaded line by line into a pre-allocated
>> numpy array.
> nice stuff.
> Have you looked at my "accumulator" class, rather than pre-allocating? 
> Less the class itself than that ideas behind it. It's easy enough to do, 
> and would keep you from having to run through the file twice. The cost 
> of memory re-allocation as the array grows is very small.
> I've posted the code recently, but let me know if you want it again.

I agree it would make a very nice addition, and could complement my 
pre-allocation option for loadtxt - however there I've also been made 
aware that this approach breaks streamed input etc., so the buffer.resize(…) 
methods in accumulator would be the better way to go. 
For load table this is not quite as straightforward, though, because the type 
auto-detection, strictly done, requires to scan the entire input, because a 
column full of int could still produce a float in the last row… 
I'd say one just has to accept that this kind of auto-detection is incompatible 
with input streams, and with the necessity to scan the entire data first anyway, 
pre-allocating the array makes sense as well. 

For better consistency with what people have likely got used to from npyio, 
I'd recommend some minor changes:

make spaces the default delimiter

enable automatic decompression (given the modularity, could you simply 
use np.lib._datasource.open() like genfromtxt?)

Derek Homeier          Centre de Recherche Astrophysique de Lyon
ENS Lyon                                      46, Allée d'Italie
69364 Lyon Cedex 07, France                  +33 1133 47272-8894

More information about the NumPy-Discussion mailing list