[Numpy-discussion] fromfile() for reading text (one more time!)

Christopher Barker Chris.Barker at noaa.gov
Tue Jan 5 12:32:01 EST 2010

josef.pktd at gmail.com wrote:
> On Mon, Jan 4, 2010 at 10:39 PM,  <alan at ajackson.org> wrote:
>> I rather like the R command(s) for reading text files

> Aren't the newly improved
> numpy.genfromtxt()


> and friends indented to handle all this

Yes, they are, and they are great, but not really all that fast. If 
you've got big complicated tables of data to read, then genfromtxt is 
the way to go -- it's a great tool. However, for the simple stuff, it's 
not really optimized. I also find I have to read a lot of text files 
that aren't tables of data, but rather an odd mix of stuff, but still a 
lot of reading lots of numbers from a file. As far as I can tell, 
genfromtxt and loadtxt can only load the entire file as a table (a very 
common situation, of course).

Paul Ivanov wrote:
> Just a potshot, but have you tried np.loadtxt?
> I find it pretty fast.

I guess I should have posted timings in the first place:

In [19]: timeit timing.time_genfromtxt()
10 loops, best of 3: 216 ms per loop

In [20]: timeit timing.time_loadtxt()
10 loops, best of 3: 166 ms per loop

In [21]: timeit timing.time_fromfile()
10 loops, best of 3: 47.1 ms per loop

(40,000 doubles from a space-delimted text file)

so fromfile() is 3.5 times as fast as loadtxt and 4.5 times as fast as 
genfromtxt. That does make a difference for me -- the user waiting 4 
seconds, rather than one second to load a file matters.

I suppose another option might be to see if I can optimize the inner 
scanning function of genfromtxt with Cython or C, but I'm not sure 
that's possible, as it's really very flexible, and re-writing all of 
that without Python would be really painful!


Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov

More information about the NumPy-Discussion mailing list