Generating text from a regular expression

Tim Chase python.list at tim.thechases.com
Wed Mar 31 08:01:57 EDT 2010


Nathan Harmston wrote:
> I have a slightly complicated/medium sized regular expression
> and I want to generate all possible words that it can match
> (to compare performance of regex against an acora based
> matcher). Using the regular expression as a grammar to
> generate all words in its language. I was wondering if this
> possible in Python or possible using anything. Google doesnt
> seem to give any obvious answers.

Unless you limit your regexp to bounded regular expressions (you 
don't use the "*", "+" and unbounded-top-end "{...}" tokens), it 
has an infinite number of possible words that can match.  As a 
simple example, the regexp "a*" matches "" (the empty string), 
"a", "aa", "aaa", "aaaa"...and so on up to the limit of a string 
length.

There was a discussion on this a while back:

http://mail.python.org/pipermail/python-list/2010-February/1235120.html

so you might chat with the person who started that thread.  You 
basically have to write a regexp parser that generates code to 
generate the expressions, and even simple expressions such as 
"[a-z]{1,30}" can create an astronomical number of possible inputs.

-tkc






More information about the Python-list mailing list