[Numpy-discussion] Fast Reading of ASCII files
chris.barker at noaa.gov
Mon Dec 12 12:22:16 EST 2011
On 12/11/11 8:40 AM, Ralf Gommers wrote:
> On Wed, Dec 7, 2011 at 7:50 PM, Chris.Barker <Chris.Barker at noaa.gov
> * If we have a good, fast ascii (or unicode?) to array reader, hopefully
> it could be leveraged for use in the more complex cases. So that rather
> than genfromtxt() being written from scratch, it would be a wrapper
> around the lower-level reader.
> You seem to be contradicting yourself here. The more complex cases are
> Wes' 10% and why genfromtxt is so hairy internally. There's always a
> trade-off between speed and handling complex corner cases. You want both.
I don't think the version in my mind is contradictory (Not quite).
What I'm imagining is that a good, fast ascii to numpy array reader
could read a whole table in at once (the common, easy, fast, case), but
it could also be used to read snippets of a file in at a time, which
could be leveraged to handle many of the more complex cases.
I suppose there will always be cases where the user needs to write their
own converter from string to dtype, and there is simply no way to
leverage what I'm imagining to supported that.
Hmm, maybe there is -- for instance, if a "record" consisted off mostly
standard, easy-to-parse, numbers, but one field was some weird text that
needed custom parsing, we could read it as a dtype, with a string for
that one weird field, and that could be converted in a post-processing step.
Maybe that wouldn't be any faster or easier, but it could be done...
Anyway, whether you can leverage it for the full-featured version or
not, I do think there is call for a good, fast, 90% case text file parser.
Would anyone like to join/form a small working group to work on this?
Wes, I'd like to see your Cython version -- maybe a starting point?
Christopher Barker, Ph.D.
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion