Seeking regex optimizer
paddy3118 at netscape.net
Sun Jun 18 22:30:55 CEST 2006
Kay Schluehr wrote:
> I have a list of strings ls = [s_1,s_2,...,s_n] and want to create a
> regular expression sx from it, such that sx.match(s) yields a SRE_Match
> object when s starts with an s_i for one i in [0,...,n]. There might
> be relations between those strings: s_k.startswith(s_1) -> True or
> s_k.endswith(s_1) -> True. An extreme case would be ls = ['a', 'aa',
> ...,'aaaa...ab']. For this reason SRE_Match should provide the longest
> possible match.
> Is there a Python module able to create an optimized regex rx from ls
> for the given constraints?
A start would be:
regexp = "^(" + "|".join(sorted(ls, reverse=True)) + ")"
But the above does not work if you have special characters in your
You say you want something that is optimised. What have have you tried?
More information about the Python-list