Generating text from a regular expression
Tim Chase
python.list at tim.thechases.com
Wed Mar 31 08:01:57 EDT 2010
Nathan Harmston wrote:
> I have a slightly complicated/medium sized regular expression
> and I want to generate all possible words that it can match
> (to compare performance of regex against an acora based
> matcher). Using the regular expression as a grammar to
> generate all words in its language. I was wondering if this
> possible in Python or possible using anything. Google doesnt
> seem to give any obvious answers.
Unless you limit your regexp to bounded regular expressions (you
don't use the "*", "+" and unbounded-top-end "{...}" tokens), it
has an infinite number of possible words that can match. As a
simple example, the regexp "a*" matches "" (the empty string),
"a", "aa", "aaa", "aaaa"...and so on up to the limit of a string
length.
There was a discussion on this a while back:
http://mail.python.org/pipermail/python-list/2010-February/1235120.html
so you might chat with the person who started that thread. You
basically have to write a regexp parser that generates code to
generate the expressions, and even simple expressions such as
"[a-z]{1,30}" can create an astronomical number of possible inputs.
-tkc
More information about the Python-list
mailing list