Seeking regex optimizer

Paddy paddy3118 at
Sun Jun 18 22:30:55 CEST 2006

Kay Schluehr wrote:
> I have a list of strings ls = [s_1,s_2,...,s_n] and want to create a
> regular expression sx from it, such that sx.match(s) yields a SRE_Match
> object when s starts with an s_i for one i in [0,...,n].  There might
> be relations between those strings: s_k.startswith(s_1) -> True or
> s_k.endswith(s_1) -> True. An extreme case would be ls = ['a', 'aa',
> ...,'aaaa...ab']. For this reason SRE_Match should provide the longest
> possible match.
> Is there a Python module able to create an optimized regex rx from ls
> for the given constraints?
> Regards,
> Kay

A start would be:
  regexp = "^(" + "|".join(sorted(ls, reverse=True)) + ")"
But the above does not work if you have special characters in your

You say you want something that is optimised. What have have you tried?

- Pad.

More information about the Python-list mailing list