regex recursive matching (regex 2015.07.19)
Terry Reedy
tjreedy at udel.edu
Tue Aug 18 11:27:43 EDT 2015
On 8/18/2015 10:25 AM, Neal Becker wrote:
> Trying regex 2015.07.19
>
> I'd like to match recursive parenthesized expressions, with groups such that
> '(a(b)c)'
Extended regular expressions can only match strings in extended regular
languages. General nested expressions are too general for that. You
need a context-free parser. You can find them on pypi or write your
own, which in this case is quite simple.
---
from xploro.test import ftest # my personal function test function
io_pairs = (('abc', []), ('(a)', [(0, '(a)')]), ('a(b)c', [(1, '(b)')]),
('(a(b)c)', [(0, '(a(b)c)'), (2, '(b)')]),
('a(b(cd(e))(f))g', [(1, '(b(cd(e))(f))'), (3, '(cd(e))'),
(6, '(e)'), (10, '(f)')]),)
def parens(text):
'''Return sorted list of paren tuples for text.
Paren tuple is start index (for sorting) and substring.
'''
opens = []
parens = set()
for i, char in enumerate(text):
if char == '(':
opens.append(i)
elif char == ')':
start = opens.pop()
parens.add((start, text[start:(i+1)]))
return sorted(parens)
ftest(parens, io_pairs)
---
all pass
> would give
> group(0) -> '(a(b)c)'
> group(1) -> '(b)'
>
> but that's not what I get
>
> import regex
>
> #r = r'\((?>[^()]|(?R))*\)'
> r = r'\(([^()]|(?R))*\)'
> #r = r'\((?:[^()]|(?R))*\)'
> m = regex.match (r, '(a(b)c)')
>
> m.groups()
> Out[28]: ('c',)
>
--
Terry Jan Reedy
More information about the Python-list
mailing list