Get nesting of regex groups

Mattias Ugelvik uglemat at gmail.com
Wed Apr 8 23:58:59 CEST 2015


I'm making a 'declarative string manipulation' tool, the interface of
which should work like this:

>>> rules(r'(?P<outer>(?P<inner>a?))(?P<separate>b?)', {
...   'separate': '.suffix',
...   'inner': 'abc',
...   'outer': lambda string: 'some-{}-manipulation'.format(string)
... }).apply('a')
'some-abc-manipulation.suffix'

Since the 'inner' group is nested, it should be replaced first, then
the replacement function for 'outer' will continue the replacements.
When 'inner' matches the empty string and its span is identical to
'outer', then I need to know whether it is nested, or if it's outside
like 'separate'.

> Pardon me for stating the obvious,

No problem, I can see why my question is weird. I actually implemented
the interface above before I realized that these ambiguities even
existed.

On 08/04/2015, Denis McMahon <denismfmcmahon at gmail.com> wrote:
> On Wed, 08 Apr 2015 22:54:57 +0200, Mattias Ugelvik wrote:
>
>> Example: re.compile('(?P<outer>(?P<inner>a))')
>>
>> How can I detect that 'inner' is a nested group of 'outer'? I know that
>> 'inner' comes later, because I can use the `regex.groupindex` (thanks to
>> your help earlier:
>> https://mail.python.org/pipermail/python-list/2015-April/701594.html).
>
> Pardon me for stating the obvious, but as the person defining the re, and
> assuming you haven't generated another sub-pattern somewhere in the same
> re with the same name, how can inner ever not be a nested group of outer?
>
> Even in the contrived example below, it is clear that the list of tuples
> generated by by findall is of the form:
>
> ()[0] = 'outer', ()[1] = 'inner'
>
> from the order of matches principle.
>
> --------------------------------
>
> #!/usr/bin/python
>
> import re
>
> patt = re.compile('(?P<outer>a+(?P<inner>b+))')
>
> result = patt.findall('abaabbaaabbbaaaabbbb')
>
> print result
>
> --------------------------------
>
> however if all you are doing is using .search or .find for the first
> match of the pattern, then there should be no scope for confusion anyway.
>
> --
> Denis McMahon, denismfmcmahon at gmail.com
> --
> https://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list