regular expression example generator

Jon Fernquest ferni at loxinfo.co.th
Fri Oct 8 00:26:41 EDT 1999


John Farrell wrote:
> I have a faint recollection of hearing somewhere that someone for some
> language had written some software which took a regular expression and
> generated for you the types of things that would match it:

Grail (free,C++ finite state library) can generate the set of
strings that a finite state machine defines.
http://www.csd.uwo.ca/research/grail/grail.html
Though first you have to generate the finite state machine from the regex.

Anyway, just looking at the finite state machine for a given
regex is going to make that regex more understandable/friendly.

You can also add error states to the finite state machine
so you can know what went wrong when an expected regex match
failed.

If you want to roll your own example generator...
You can generate example strings for the regex by traversing
finite state machine's graph repeatedly from start to finish node.
You can choose random paths if the regex can generate an infinite
or large number of strings or if the number of strings it can
generate is finite you can do a thorough traversal of every possible
path from a start to an end state.

Elements in the regex like ".*", "\w+", ".+?" are obviously going to cause
problems if you don't limit the alphabet and the number of iterations
in any expansion. You can also generate strings with a reduced alphabet.

My first impulse when I see a knarly regex is to pull my hair out.
Seeing it reduced to a simple finite state diagram is a calming experience.
(see for example p65 Hall and Schwartz, Effective Perl Programming).

IMHO Complicated textual patterns were not meant (by god?) to be described
within the confines of a string, but a graph? Graphs must have some favored
status in the universe. (See all the beautiful automata graphs that can be
used for parsing in Gough (1988). Syntax Analysis and Software Tools.)

A powerful graph editor like:
http://www.fmi.uni-passau.de/Graphlet/
Would be a nice complement to Perl.
This is already a Tcl/Tk system based on C code.
So it should be easy to integrate it into Python/Tkinter??????

It could be used for a much more powerful set of finite state
tools than Perl currently has. See the notes from lecture 4
at Columbia on the Bell Labs set of finite state tools:
http://www.cs.columbia.edu/~mohri/notes.html
Index:
http://www.cvn.columbia.edu/courses/Summer1999/COMSE6998-2.html


Jon Fernquest
bayinnaung at hotmail.com
ferni at loxinfo.co.th












More information about the Python-list mailing list