Generating text from a regular expression
gagsl-py2 at yahoo.com.ar
Wed Mar 31 13:58:03 CEST 2010
En Wed, 31 Mar 2010 07:49:14 -0300, Nathan Harmston
<iwanttobeabadger at googlemail.com> escribió:
> I have a slightly complicated/medium sized regular expression and I
> want to generate all possible words that it can match (to compare
> performance of regex against an acora based matcher). Using the
> regular expression as a grammar to generate all words in its language.
> I was wondering if this possible in Python or possible using anything.
> Google doesnt seem to give any obvious answers.
I've done this some time ago.
provides an infinite merge operation, required for effectively enumerating
a regular language (with shorter strings first, else 'a*b' would get stuck
repeating "a"s and never yielding any "b").
The example at the end shows the basics of how to use it (by hand) in a
very simple case.
To enumerate all strings matching '(a|bc)*' one should invoke
closure(product(['a'],['b','c'])), which gives:
'', 'a', 'aa', 'bc',
'aaa', 'abc', 'bca',
'aaaa', 'aabc', 'abca', 'bcaa', 'bcbc',
'aaaaa', 'aaabc', 'aabca', 'abcaa', 'abcbc', 'bcaaa', 'bcabc', 'bcbca',
Converting the former expression into the later function calls requires a
parser (not shown -- and I won't be able to find it until Friday)
More information about the Python-list