[Numpy-discussion] How to read data from text files fast?
Chris Barker
Chris.Barker at noaa.gov
Thu Jul 8 13:50:03 EDT 2004
Todd Miller wrote:
> I looked this over to see how hard it would be to port to numarray. At
> first glance, it looks easy. I didn't really read it closely enough to
> pick up bugs, but what I saw looks good. One thing I did notice was a
> calloc of temporary data space. That seemed like a possible waste: can't
> you just preallocate the array and read your data directly into it?
The short answer is that I'm not very smart! The longer answer is that
this is because at first I misunderstood what PyArray_FromDimsAndData
was for. For ScanFileN, I'll re-do it as you suggest.
For ScanFile, it is unknown at the beginning how big the final array is,
and I did scheme that would allocate the memory as it went, in
reasonable sized chunks. However, this does require a full copy, which
is a problem. Since posting, I thought of a MUCH easier scheme:
scan the file, without storing the data, to see how many numbers there are.
rewind the file
allocate the Array
Read the data.
This requires scanning the file twice, which would cost, but would be
easier, and prevent an unnecessary copy of the data. I hope I"ll get a
change to try it out and see what the performance is like. IN the
meantime, anyone else have any thoughts?
By the way, does it matter whether I use malloc or calloc? I can't
really tell the difference from K&R.
> This is
> probably a very minor speed issue, but might be a significant storage issue
> as people are starting to max out 32-bit systems.
yup. This is all pointless if it's not a lot of data, after all.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list