Enumerating Regular Expressions
Karthik Gurusamy
kar1107 at gmail.com
Tue May 9 21:06:13 EDT 2006
blair.bethwaite at gmail.com wrote:
> James Stroud wrote:
> > You see the difficulty don't you? How will the computer know in advance
> > that the regex matches only a finite set of possible strings?
>
> Well sure it might be a little difficult to figure _that_ out, although
> probably not all that hard if you converted to an FSA or something. I
> imagine detecting looping would be as simple as applying the right
> graph algorithm.
That's right. For finite regular language, you don't allow cycles. That
means the graph must be a DAG (directed acyclic graph. If not directed
it must be a tree, simple to check as node-count is edge-count plus
one).
Once you have a DAG, the problem becomes enumerating all paths from
root node to any final node. This is a pretty straighforward problem in
graph theory. I think you can simply find all from root-node to any
other node and discard any of the path ending in a non-final state
node.
Karthik
>
> But that's not the point, you don't strictly need to know in advance
> whether the regex represents a finite or infinite language. I just
> want to enumerate it, if it's going to take longer than the life of the
> universe and a stack size to match to do it, then so be it. It's
> surely up to the user to be careful about how they form their
> expressions. One way to get around it would be to disallow the *
> operator in enumerations.
>
> Cheers,
> -Blair
More information about the Python-list
mailing list