How to print all expressions that match a regular expression

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Mon Feb 8 05:19:34 EST 2010


En Mon, 08 Feb 2010 02:17:59 -0300, hzhuo1 at gmail.com <hzhuo1 at gmail.com>
escribió:

>> Please check out this example on the pyparsing wiki,  
>> invRegex.py:http://pyparsing.wikispaces.com/file/view/invRegex.py.  
>>  This code
>> implements a generator that returns successive matching strings for
>> the given regex.  [...]
>> Of course, as other posters have pointed out, this inverter does not
>> accept regexen with unbounded multiple characters '+' or '*', but '?'
>> and "{min,max}" notation will work.  Even '.' is supported, although
>> this can generate a large number of return values.
>
> Thanks very much. This is exactly what I need now. I will check this
> function.
>

Here you have another approach based on [1]. This is a generator-based
approach, yielding all strings in increasing length order. In principle it
can handle unbounded repetitions, except as written the maximum recursion
limit is shortly reached (the original code is in Haskell, I almost
blindly translated it into Python; certainly it can be rewritten more
efficiently)

You have to parse the R.E. and generate the corresponding function calls
to the merge/prod/closure functions -- pyparsing certainly can help with
that. "ab" becomes prod(a,b), "a|b" becomes merge(a,b), and "a*" becomes
closure(a)

By example, to find the language defined by this expression "(a|bc)*d",
one has to evaluate:

prod(
     closure(
       merge(['a'],
             prod(['b'],['c']))),
     ['d']
)

wich yields these strings:

d
ad
aad
bcd
aaad
abcd
bcad
...

bcbcbcbcaad
bcbcbcbcbcd
aaaaaaaaaaad

and after 234 results aborts with a recursion error :(

[1] http://www.cs.utexas.edu/users/misra/Notes.dir/RegExp.pdf

-- 
Gabriel Genellina
-------------- next part --------------
A non-text attachment was scrubbed...
Name: enumerate_regular_language.py
Type: application/octet-stream
Size: 2732 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20100208/0ed12f59/attachment.obj>


More information about the Python-list mailing list