[Numpy-discussion] load from text files Pull Request Review

27 Aug 2011

      Hi--

I've submitted a pull request for a new method for loading data from
text files into a record array/masked record array.

https://github.com/numpy/numpy/pull/143

Click on the link for more info, but the general idea is to create a
regular expression for what entries should look like and loop over the
file, updating the regular expression if it's wrong. Once the types
are determined the file is loaded line by line into a pre-allocated
numpy array.

Compared to genfromtxt this function has several advantages/potential
advantages.

*More modular (genfromtxt is a rather large, nearly 500 line,
monolithic function. In my pull request no individual method is longer
than around 80 lines, and they're fairly self-contained.)
*delimiters can be specified via regex's
*missing data can be specified via regex's
*it's bit simpler and has sensible defaults
*it actually works on some (unfortunately proprietary) data that
genfromtxt doesn't seem robust enough for
*it supports datetimes
*fairly extensible for the power user
*makes two passes through the file, the first to determine types/sizes
for strings and the second to read in the data, and pre-allocates the
array for the second pass. So no giant memory bloating for reading
large text files
*fairly fast, though I think there is plenty of room for optimizations

All that said, it's entirely possible that the innards which determine
the type should be ripped out and submitted as a function on their
own.

I'd love suggestions for improvements, as well as suggestions for a
better name. (Currently it's called loadtable, which I don't really
like. It was just a working name.)

-Chris Jordan-Squire

[Numpy-discussion] load from text files Pull Request Review

Christopher Jordan-Squire