
On 11/16/10 7:31 AM, Darren Dale wrote:
On Tue, Nov 16, 2010 at 9:55 AM, Pauli Virtanen<pav@iki.fi> wrote:
Tue, 16 Nov 2010 09:41:04 -0500, Darren Dale wrote: [clip]
That loop takes 0.33 seconds to execute, which is a good start. I need some help converting this example to return an actual numpy array. Could anyone please offer a suggestion?
Darren, It's interesting that you found fromstring() so slow -- I've put some time into trying to get fromfile() and fromstring() to be a bit more robust and featurefull, but found it to be some really painful code to work on -- but it didn't dawn on my that it would be slow too! I saw all the layers of function calls, but I still thought that would be minimal compared to the actual string parsing. I guess not. Shows that you never know where your bottlenecks are without profiling. "Slow" is relative, of course, but since the whole point of fromfile/string is performance (otherwise, we'd just parse with python), it would be nice to get them as fast as possible. I had been thinking that the way to make a good fromfile was Cython, so you've inspired me to think about it some more. Would you be interested in extending what you're doing to a more general purpose tool? Anyway, a comment or two:
cdef extern from 'stdlib.h': double atof(char*)
One thing I found with the current numpy code is that the use of the ato* functions is a source of a lot of bugs (all of them?) the core problem is error handling -- you have to do a lot of pointer checking to see if a call was successful, and with the fromfile code, that error handling is not done in all the layers of calls. Anyone know what the advantage of ato* is over scanf()/fscanf()? Also, why are you doing string parsing rather than parsing the files directly, wouldn't that be a bit faster? I've got some C extension code for simple parsing of text files into arrays of floats or doubles (using fscanf). I'd be curious how the performance compares to what you've got. Let me know if you're interested. -Chris
def test(): py_string = '100' cdef char* c_string = py_string cdef int i, j cdef double val i = 0 j = 2048*1200 cdef np.ndarray[np.float64_t, ndim=1] ret
ret_arr = np.empty((2048*1200,), dtype=np.float64) ret = ret_arr
d = time.time() while i<j: c_string = py_string ret[i] = atof(c_string) i += 1 ret_arr.shape = (1200, 2048) print ret_arr, ret_arr.shape, time.time()-d
The loop now takes only 0.11 seconds to execute. Thanks again. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov