need fast parser for comma/space delimited numbers
Tim Peters
tim_one at email.msn.com
Sat Mar 18 13:05:40 EST 2000
[posted & mailed]
[Les Schaffer]
> ...
> Still, the app takes about 5 minutes to parse a typical set of data
> files. I'd like to drop that down to a minute of possible.
Before you go nuts with low-level trickery,
> ...
> here's where things stand at the moment:
>
> 186209 function calls in 305.260 CPU seconds
>
> Ordered by: standard name
> ...
Is "305.260 CPU seconds" the "about 5 minutes" you're talking about? If so,
tell us how long the app takes when *not* using the profiler <0.5 wink --
the profiler is good for getting a sense of relative times, but usually adds
very significant per-call overheads of its own>.
Picking on the most expensive function:
def __parseIFF(self, str):
"""..."""
array = [string.atoi(str[0])]
for item in str[1:] :
array.append( string.atof(item) )
return array
First we can speed it up:
def __parseIFF(self, str):
"""..."""
array = map(float, str) # btw, "str" is a poor name for a list
array[0] = int(array[0])
return array
and then you should write it inline. This self-contained test is close to
your worst-case parsing problem and takes under 25 seconds on my creaky old
P5-166:
def doit():
data = ["378",
" 0.001094949000",
" 0.000031531040",
" 0.005158320000"]
_int, _map, _float = int, map, float # localize for minor gain
for i in xrange(86436): # the # of times the parser got called
array = _map(_float, data)
array[0] = _int(array[0])
from time import clock
start = clock()
doit()
finish = clock()
print round(finish - start, 3)
no-need-for-c-ly y'rs - tim
More information about the Python-list
mailing list