How to check if any item from a list of strings is in a big string?

Nobody nobody at nowhere.com
Tue Jul 14 17:40:27 EDT 2009


On Tue, 14 Jul 2009 02:06:04 -0300, Gabriel Genellina wrote:

>> Matt, how many words are you looking for, in how long a string ?
>> Were you able to time any( substr in long_string ) against re.compile
>> ( "|".join( list_items )) ?
> 
> There is a known algorithm to solve specifically this problem  
> (Aho-Corasick), a good implementation should perform better than R.E. (and  
> better than the gen.expr. with the advantage of returning WHICH string  
> matched)

Aho-Corasick has the advantage of being linear in the length of the
patterns, so the setup may be faster than re.compile(). The actual
searching won't necessarily be any faster (assuming optimal
implementations; I don't know how safe that assumption is).




More information about the Python-list mailing list