[Numpy-discussion] Speeding up loadtxt / savetxt

Chris Barker Chris.Barker at noaa.gov
Wed Apr 28 12:36:53 EDT 2010


Andreas Hilboll wrote:
> Yes, I know. But the files I create must be readable by an application
> developed in-house at our institude, and that only supports a) ASCII files
> or b) some home-grown binary format, which I hate.
> 
>> Also, an efficient reader for very simply formatted text is provided
>> by numpy.fromfile.
> 
> Yes, I heard about it. But the files I have to read have comments in them,
> and I didn't find a way to exclude these easily.

you can't do i with fromfile -- I think it would be Very useful to have 
a fromfile() like functionality with a few more features: comments lines 
and allowing non-whitespace delimiters while reading multiple lines. See 
my posts about this in the past.

I did spend a non-trivial amount of time looking into how to add these 
features, and fix some bugs in the process -- again, see my posts in the 
past. It turns out that the fromfile code is some pretty ugly C--a 
result of supporting all numpy data types, and compatibility with 
tradition C functions--so it's a bit of a chore, at least for a lame C 
programmer like me.

I'm still not sure what I'll do when I get some time to look at this 
again -- I may simply start from scratch with Cython.

It would be great if someone wanted to take it on

> Time needed to read a 100M file is ~13 seconds, and to write ~5 seconds.
> Which is not too bad, but also still too much ...

You might try running fromfile() on a file with no comments, and you 
could see from that how much speed gain is possible -- at some point, 
you're waiting on the disk anyway.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list