Speeding up a regular expression

Michael Lerner mlerner at umich.DELETEME.edu
Tue Oct 23 19:11:03 CEST 2001


Hi,

I'm a relative newbie to Python, and I'm certainly no regular expression
wizard.  I have a text file with a bunch of lines of the form

 1-1.1 2.2 -3.3  4.4     5.5 -6.6

That is, an integer, followed by six floats, with an arbitrary number of
spaces in between the numbers.  Note that that arbitrary number can be
zero, as is the case between the 1 and -1.1 above.

There are also a bunch of other lines in the file.  I only want the ones
that are like the line above.

So, here's what I did:

---- begin my schlocky code ----

import re

def gimmeWhatIWant(inputString):
    myRe = re.compile(r"""
        ^                    # start at the beginning of the line
        (\s*)                # our leading spaces
        (\d+\s*)             # the integer, which may or may not
                             # have a trailing space!
        (-?\d+\.\d+\s*){6,6} # all six floats MAY have spaces
                             # after them
        $                    # end at the end of the line
        """, re.VERBOSE)

    lines = string.split(inputString,"\n")
    returnString = ""
    for line in lines:
        if myRe.match(line):
            returnString = returnString + line + "\n"

    return returnString

---- end my schlocky code ----

The thing is, this is slow when I run it on input strings with 6 or 7
thousand lines.

Any hints on how I could speed it up?

One thing:  I think that replacing the string.split(...) call with
inputString.split("\n") might speed things up a little. But, that's not
where most of the time is spent and I'd like this to work with Python
1.5.2 if possible.

thanks,

-michael




More information about the Python-list mailing list