Regular expressions in python

Harvey Thomas hst at empolis.co.uk
Wed Jul 3 11:53:09 EDT 2002


Graeme Longman [mailto:glongman at ilangua.com] wrote
> Hi,
> 
> I'm using the python module re to search through strings of html text
> but I have found that it is taking too long using the seach method.
> 
> I am looping though a list of regular expressions and I find that it
> takes much longer when no match is found for the expression 
> than it does
> when a match is found. Is this normal ?
> 
> I have fixed the problem for now by using string.find() 
> before searching
> the text but was wondering if anyone had any ideas on a better
> technique.
> 
> Is there something else I should be using ? I am using '.*' and
> re.DOTALL in my expressions but that doesn't seem to be the problem.
> 
> Thanks for any help in advance.
> 
> Graeme
> 
I've found it is MUCH faster if you convert your list of regular expressions into a set of bracketed expressions separated by | (use re.VERBOSE as well!) and then use re.findall. That way you get a giant list of tuples, with the non-matching expressions returning the empty string.

HTH

Harvey

_____________________________________________________________________
This message has been checked for all known viruses by the MessageLabs Virus Scanning Service.





More information about the Python-list mailing list