[Numpy-discussion] Fast Reading of ASCII files

Chris Barker chris.barker at noaa.gov
Tue Dec 13 16:07:56 EST 2011


On Tue, Dec 13, 2011 at 11:29 AM, Bruce Southey <bsouthey at gmail.com> wrote:

> **
> Reading data is hard and writing code that suits the diversity in the
> Numerical Python community is even harder!
>
>
yup

Both loadtxt and genfromtxt functions (other functions are perhaps less
> important) perhaps need an upgrade to incorporate the new NA object.
>

yes, if we are satisfiedthat the new NA object is, in fact, the way of the
future.


> Here I think loadtxt is a better target than genfromtxt because, as I
> understand it, it assumes the user really knows the data. Whereas
> genfromtxt can ask the data for the appropriatye format.
>
> So I agree that new 'superfast custom CSV reader for well-behaved data'
> function would be rather useful especially as an replacement for loadtxt.
> By that I mean reading data using a user specified format that essentially
> follows the CSV format (
> http://en.wikipedia.org/wiki/Comma-separated_values) - it needs are to
> allow for NA object, skipping lines and user-defined delimiters.
>
>
I think that ideally, there could be one interface to reading tabular data
-- hopefully, it would be easy for the user to specify what the want, and
if they don't the code tries to figure it out. Also, under the hood, the
"easy" cases are special-cased to high-performing versions.

genfromtxt sure looks close for an API -- it just needs the "high
performance special cases" under the hood. It may be that the way it's
designed makes it very difficult to do that, though -- I haven't looked
closely enough to tell.

At least that's what I'm thinking at the moment.

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111213/1225bdd8/attachment.html>


More information about the NumPy-Discussion mailing list