need fast parser for comma/space delimited numbers

Tim Peters tim_one at email.msn.com
Sat Mar 18 13:05:40 EST 2000


[posted & mailed]

[Les Schaffer]
> ...
> Still, the app takes about 5 minutes to parse a typical set of data
> files. I'd like to drop that down to a minute of possible.

Before you go nuts with low-level trickery,

> ...
> here's where things stand at the moment:
>
>          186209 function calls in 305.260 CPU seconds
>
>    Ordered by: standard name
> ...

Is "305.260 CPU seconds" the "about 5 minutes" you're talking about?  If so,
tell us how long the app takes when *not* using the profiler <0.5 wink --
the profiler is good for getting a sense of relative times, but usually adds
very significant per-call overheads of its own>.

Picking on the most expensive function:

    def __parseIFF(self, str):
        """..."""
        array = [string.atoi(str[0])]
        for item in str[1:] :
            array.append( string.atof(item)  )
        return array

First we can speed it up:

    def __parseIFF(self, str):
        """..."""
        array = map(float, str)  # btw, "str" is a poor name for a list
        array[0] = int(array[0])
        return array

and then you should write it inline.  This self-contained test is close to
your worst-case parsing problem and takes under 25 seconds on my creaky old
P5-166:

def doit():
    data = ["378",
            "   0.001094949000",
            "  0.000031531040",
            "  0.005158320000"]
    _int, _map, _float = int, map, float  # localize for minor gain
    for i in xrange(86436):   # the # of times the parser got called
        array = _map(_float, data)
        array[0] = _int(array[0])

from time import clock
start = clock()
doit()
finish = clock()
print round(finish - start, 3)

no-need-for-c-ly y'rs  - tim






More information about the Python-list mailing list