Regular Expressions: large amount of or's

Kent Johnson kent37 at tds.net
Tue Mar 1 16:29:25 EST 2005


André Søreng wrote:
> 
> Hi!
> 
> Given a string, I want to find all ocurrences of
> certain predefined words in that string. Problem is, the list of
> words that should be detected can be in the order of thousands.
> 
> With the re module, this can be solved something like this:
> 
> import re
> 
> r = re.compile("word1|word2|word3|.......|wordN")
> r.findall(some_string)
> 
> Unfortunately, when having more than about 10 000 words in
> the regexp, I get a regular expression runtime error when
> trying to execute the findall function (compile works fine, but slow).

What error do you get? What version of Python are you using? re was changed in Python 2.4 to avoid 
recursion, so if you are getting a stack overflow in Python 2.3 you should try 2.4.

Kent



More information about the Python-list mailing list