Regular expressions in python
Harvey Thomas
hst at empolis.co.uk
Wed Jul 3 11:53:09 EDT 2002
Graeme Longman [mailto:glongman at ilangua.com] wrote
> Hi,
>
> I'm using the python module re to search through strings of html text
> but I have found that it is taking too long using the seach method.
>
> I am looping though a list of regular expressions and I find that it
> takes much longer when no match is found for the expression
> than it does
> when a match is found. Is this normal ?
>
> I have fixed the problem for now by using string.find()
> before searching
> the text but was wondering if anyone had any ideas on a better
> technique.
>
> Is there something else I should be using ? I am using '.*' and
> re.DOTALL in my expressions but that doesn't seem to be the problem.
>
> Thanks for any help in advance.
>
> Graeme
>
I've found it is MUCH faster if you convert your list of regular expressions into a set of bracketed expressions separated by | (use re.VERBOSE as well!) and then use re.findall. That way you get a giant list of tuples, with the non-matching expressions returning the empty string.
HTH
Harvey
_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.
More information about the Python-list
mailing list