Enumerating Regular Expressions

Karthik Gurusamy kar1107 at gmail.com
Tue May 9 21:06:13 EDT 2006


blair.bethwaite at gmail.com wrote:
> James Stroud wrote:
> > You see the difficulty don't you? How will the computer know in advance
> > that the regex matches only a finite set of possible strings?
>
> Well sure it might be a little difficult to figure _that_ out, although
> probably not all that hard if you converted to an FSA or something.  I
> imagine detecting looping would be as simple as applying the right
> graph algorithm.

That's right. For finite regular language, you don't allow cycles. That
means the graph must be a DAG (directed acyclic graph. If not directed
it must be a tree, simple to check as node-count is edge-count plus
one).

Once you have a DAG, the problem becomes enumerating all paths from
root node to any final node. This is a pretty straighforward problem in
graph theory. I think you can simply find all from root-node to any
other node and discard any of the path ending in a non-final state
node.

Karthik

>
> But that's not the point, you don't strictly need to know in advance
> whether the regex represents a finite or infinite language.  I just
> want to enumerate it, if it's going to take longer than the life of the
> universe and a stack size to match to do it, then so be it.  It's
> surely up to the user to be careful about how they form their
> expressions.  One way to get around it would be to disallow the *
> operator in enumerations.
> 
> Cheers,
> -Blair




More information about the Python-list mailing list