[Numpy-discussion] Fast Reading of ASCII files

Chris.Barker chris.barker at noaa.gov
Mon Dec 12 12:22:16 EST 2011

On 12/11/11 8:40 AM, Ralf Gommers wrote:
> On Wed, Dec 7, 2011 at 7:50 PM, Chris.Barker <Chris.Barker at noaa.gov
>     * If we have a good, fast ascii (or unicode?) to array reader, hopefully
>     it could be leveraged for use in the more complex cases. So that rather
>     than genfromtxt() being written from scratch, it would be a wrapper
>     around the lower-level reader.
> You seem to be contradicting yourself here. The more complex cases are
> Wes' 10% and why genfromtxt is so hairy internally. There's always a
> trade-off between speed and handling complex corner cases. You want both.

I don't think the version in my mind is contradictory (Not quite).

What I'm imagining is that a good, fast ascii to numpy array reader 
could read a whole table in at once (the common, easy, fast, case), but 
it could also be used to read snippets of a file in at a time, which 
could be leveraged to handle many of the more complex cases.

I suppose there will always be cases where the user needs to write their 
own converter from string to dtype, and there is simply no way to 
leverage what I'm imagining to supported that.

Hmm, maybe there is -- for instance, if a "record" consisted off mostly 
standard, easy-to-parse, numbers, but one field was some weird text that 
needed custom parsing, we could read it as a dtype, with a string for 
that one weird field, and that could be converted in a post-processing step.

Maybe that wouldn't be any faster or easier, but it could be done...

Anyway, whether you can leverage it for the full-featured version or 
not, I do think there is call for a good, fast, 90% case text file parser.

Would anyone like to join/form a small working group to work on this?

Wes, I'd like to see your Cython version -- maybe a starting point?


Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov

More information about the NumPy-Discussion mailing list