passing multiple strings to string.find()

François Pinard pinard at
Sat Aug 9 17:55:03 CEST 2003

[Fredrik Lundh]

> Francois Pinard wrote:

> > Given the above,
> >
> >    build_regexp(['this', 'that', 'the-other'])
> >
> > yields the string 'th(?:is|at|e\\-other)', which one may choose to
> > `re.compile' before use.

> the SRE compiler looks for common prefixes, so "th(?:is|at|e\\-other)" is
> no different from "this|that|the-other" on the engine level.

Thanks for the note.  So the `build_regexp' function is not useful after
all.  It was indirectly written around a speed problem in the GNU regexp
engine, but seemingly, the Python regexp engine knows better already.  As I
wrote earlier, I first saw Emacs Lisp `regexp-opt' used within `enscript'.

A speed comparison between both methods shows that they are fairly
equivalent.  A small difference is that `build_regexp', given that one of
the word is a prefix of another, automatically recognises the longest one,
while a naive regexp of '|'.join(words) recognises whatever happens to be
listed first.  Of course, this is easily solved by sorting, then reversing
the word list before producing the naive regexp.

François Pinard

More information about the Python-list mailing list