
I've written an extension to the re library, to provide a more complete matching of hierarchical named groups in regular expressions. I've set up a sourceforge project for it: http://pyre2.sourceforge.net/ re2 extracts a hierarchy of named groups matches from a string, rather than the flat, incomplete dictionary that the standard re module returns. (ie. the re library only returns the ~last~ match for named groups - not a list of ~all~ the matches for the named groups. And the hierarchy of those named groups is non-existant in the flat dictionary of matches that results. ) eg.
import re buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping' regex='^((?P<verse>(?P<number>\d+) (?P<activity>[^,]+))(, )?)*$' pat1=re.compile(regex) m=pat1.match(buf) m.groupdict() {'verse': '10 lords a-leaping', 'number': '10', 'activity': 'lords a-leaping'}
import re2 buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping' regex='^((?P<verse>(?P<number>\d+) (?P<activity>[^,]+))(, )?)*$' pat2=re2.compile(regex) x=pat2.extract(buf) x {'verse': [{'number': '12', 'activity': 'drummers drumming'}, {'number': '11', 'activity': 'pipers piping'}, {'number': '10', 'activity': 'lords a-leaping'}]}
(See http://pyre2.sourceforge.net/ for more details.) I am wondering what would be the best direction to take this project in. Firstly is it, (or can it be made) useful enough to be included in the python stdlib? (ie. Should I bother writing a PEP for it.) And if so, would it be best to merge its functionality in with the re library, or to leave it as a separate module? And, also are there any suggestions/criticisms on the library itself?