[Numpy-discussion] Speeding up loadtxt / savetxt
Chris Barker
Chris.Barker at noaa.gov
Wed Apr 28 12:36:53 EDT 2010
Andreas Hilboll wrote:
> Yes, I know. But the files I create must be readable by an application
> developed in-house at our institude, and that only supports a) ASCII files
> or b) some home-grown binary format, which I hate.
>
>> Also, an efficient reader for very simply formatted text is provided
>> by numpy.fromfile.
>
> Yes, I heard about it. But the files I have to read have comments in them,
> and I didn't find a way to exclude these easily.
you can't do i with fromfile -- I think it would be Very useful to have
a fromfile() like functionality with a few more features: comments lines
and allowing non-whitespace delimiters while reading multiple lines. See
my posts about this in the past.
I did spend a non-trivial amount of time looking into how to add these
features, and fix some bugs in the process -- again, see my posts in the
past. It turns out that the fromfile code is some pretty ugly C--a
result of supporting all numpy data types, and compatibility with
tradition C functions--so it's a bit of a chore, at least for a lame C
programmer like me.
I'm still not sure what I'll do when I get some time to look at this
again -- I may simply start from scratch with Cython.
It would be great if someone wanted to take it on
> Time needed to read a 100M file is ~13 seconds, and to write ~5 seconds.
> Which is not too bad, but also still too much ...
You might try running fromfile() on a file with no comments, and you
could see from that how much speed gain is possible -- at some point,
you're waiting on the disk anyway.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list