[Numpy-discussion] How to read data from text files fast?

Thu Jul 8 13:50:03 EDT 2004

Todd Miller wrote:

> I looked this over to see how hard it would be to port to numarray.  At
> first glance,  it looks easy.  I didn't really read it closely enough to
> pick up bugs, but what I saw looks good.  One thing I did notice was a
> calloc of temporary data space.  That seemed like a possible waste:  can't
> you just preallocate the array and read your data directly into it?

The short answer is that I'm not very smart! The longer answer is that 
this is because at first I misunderstood what PyArray_FromDimsAndData 
was for. For ScanFileN, I'll re-do it as you suggest.

For ScanFile, it is unknown at the beginning how big the final array is, 
and I did scheme that would allocate the memory as it went, in 
reasonable sized chunks. However, this does require a full copy, which 
is a problem. Since posting, I thought of a MUCH easier scheme:

scan the file, without storing the data, to see how many numbers there are.

rewind the file

allocate the Array

Read the data.

This requires scanning the file twice, which would cost, but would be 
easier, and prevent an unnecessary copy of the data. I hope I"ll get a 
change to try it out and see what the performance is like. IN the 
meantime, anyone else have any thoughts?

By the way, does it matter whether I use malloc or calloc? I can't 
really tell the difference from K&R.

> This is
> probably a very minor speed issue,  but might be a significant storage issue
> as people are starting to max out 32-bit systems.

yup. This is all pointless if it's not a lot of data, after all.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov