Looking for a regexp generator based on a set of known string representative of a string set

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Fri Sep 8 17:55:50 CEST 2006


vbfoobar at gmail.com:
> I am looking for python code that takes as input a list of strings
> (most similar,
> but not necessarily, and rather short: say not longer than 50 chars)
> and that computes and outputs the python regular expression that
> matches
> these string values (not necessarily strictly, perhaps the code is able
> to determine patterns, i.e. families of strings...).

This may be a very simple starting point:

>>> import re
>>> strings = ["foo", "bar", "$spam"]
>>> strings2 = "|".join(re.escape(s) for s in strings)
>>> strings2
'foo|bar|\\$spam'
>>> finds = re.compile(strings2)

But I don't know how well this may work with many longer strings.
If that's not enoug we can think about more complex solutions.

Bye,
bearophile




More information about the Python-list mailing list