Two RE proposals/Group Variables

Donald McCarthy paddy3118 at tiscali.co.uk
Sat Jul 27 16:11:16 EDT 2002


Strange you should post this...

I was thinking of a similar idea but as a wholly contained extension to the re syntax.

Groups it seems group the text matched by the enclosed re section.
References to groups are references to the text matched by the grouped re section.
When groups are scanned by the re engine it seems that an association is made with the
name of the group and the text that is matched.


How about having 'meta-groups' or 'group variable' definitions within the re engine?
A group variable definition would use the re extension syntax e.g. (?V<name>...) rather
like the definition of named groups, but when scanned, no characters would be consumed,
instead the re engine would associate the name with the enclosed regular expression fragment.

Group variables would have to be instantiated before they could be used.
(?V=<name>) would be expanded to the saved re associated with the variable <name> and used
as if defining a new un-named group in the re.
For example:
    extended_re.compile(r"(?V<dinner>spam|eggs)xxxx(?V=<dinner>)")
would be equivalent to:
    re.compile(r"xxxx(spam|eggs)")

(?P<groupname!name>) would act in an equivalent way to the definition of group <groupname>
but where the enclosing re is that stored in the Group variable <name>.
i.e.:
    extended_re.compile(r"(?V<dinner>spam|eggs)xxxx(?P<yummy!dinner>)")
would be equivalent to:
    extended_re.compile(r"(?V<dinner>spam|eggs)xxxx(?P<yummy>(?V=<dinner>))")
Which should work like
    re.compile(r"xxxx(?P<yummy>spam|eggs)")


Your example would become:
    wordpunct = extended_re.compile(r"(?V<word>\w+)(?V<punct>[,.;?])(?P<!word>)(?P!punct)")


(Its around now when I realize why I hadn't posted earlier :-) )

Notice the extension in that I allow no groupname before the exclamation mark, !, and
expect the fragment to then act as if it where a definition of an un-named group.

Hierarchical composition:

    r"((?V<beer>budvar|stella)(?V<dinner>pizza|curry)(?V<food>(?V=<beer>)|(?V=<dinner>))"

   (Hey, I can leave the beer alone :-) )

I guess the match operator would have to be extended to return associations between
variable names and groups created from them by the ! syntax.

The main advantage I see is that it is a proposed extension to the re syntax and so could
be used be considered by all the languages that use 'perl like' regular expression matching.

Searches could return information about matches against the various instantiations of the
named variable - giving 'higher level' information.

David LeBlanc wrote:

  > 1. Add a substitution operator - in the example below it's "!<..>"
  >
  > word = r"\w*" punct = r"[,.;?]" wordpunct = re.compile(r"!<word>!<punct>")
  >
  > The re compiler sees r"\w*[,.;?]" Trivial example, but for fancier patterns it would
  > be great IMO. A substitution pass should be done over the substituted text for
  > nesting:
  >
  > if = r"if" term = r"something" num = r"\d*" op = r"[-+*/]" factor =
  > r"!<num>\s*!<op>\s*!<num>" expr = r"!<term>|!<factor>" if_stmt =
  > re.compile(r"!<if>\s*\(?\s*!<expr>\s*\)?\s*:" (this is just a muddle to give the idea)
  >






More information about the Python-list mailing list