
Nicolas Fleury <nidoizo@yahoo.com> wrote:
ottrey@py.redsoft.be wrote:
import re2 buf='12 drummers drumming, 11 pipers piping, 10 lords a-leaping' regex='^((?P<verse>(?P<number>\d+) (?P<activity>[^,]+))(, )?)*$' pat2=re2.compile(regex) x=pat2.extract(buf)
If one wanted to match the API of the re module, one should use pat2.findall(buf), which would return a list of 'hierarchical match objects', though with the above, one should really return a list of 'verse' items (the way the regular expression is written).
x
{'verse': [{'number': '12', 'activity': 'drummers drumming'}, {'number': '11', 'activity': 'pipers piping'}, {'number': '10', 'activity': 'lords a-leaping'}]}
Is a dictionary the good container or should another class be used? Because in the example the content of the "verse" group is lost, excluding its sub-groups. Something like a hierarchic MatchObject could provide access to both information, the sub-groups and the group itself.
Its contents are not lost, look at the overall dictionary... In any case, I think one can do better than a dictionary.
x=pat2.match(buf) #or x=pat2.findall(buf)[0] x '12 drummers drumming,' dir(x) ['verse'] x.verse '12 drummers drumming,' dir(x.verse) ['number', 'activity'] x.verse.number '12' x.verse.activity 'drummers drumming'
...would get my vote (or using obj.group(i) semantics I discuss below). I notice that this is basically what the re2 module already does (having read the web page), though rather than...
pat2.extract(buf).verse[1].activity 'pipers piping'
I would prefer...
pat2.findall(buf)[1].verse.activity 'pipers piping'
For .verse[1] or .verse[2] to make sense, it implies that the pattern is something like... ((?P<verse>... )(?P<verse>...)) ... which it isn't. I understand that the decision was probably made to make it similar to the case of... ((?P<foo>... (?p<goo>...)+)) ... where multiple matches for goo would require x.foo.goo[i].
Also, should it be limited to named groups?
Probably not. I would suggest using matchobj.group(i) semantics to match the standard re module semantics, though only allow returning items in the current level of the hierarchy. That is, one could use x.verse.group(1) and get back '12', but x.group(1) would return '12 pipers piping'
I am wondering what would be the best direction to take this project in.
Firstly is it, (or can it be made) useful enough to be included in the python stdlib? (ie. Should I bother writing a PEP for it.)
And if so, would it be best to merge its functionality in with the re library, or to leave it as a separate module?
And, also are there any suggestions/criticisms on the library itself?
I find the feature very interesting, but being used to live without it, I have difficulty evaluating its usefulness. However, it reminds me how much at first I found strange that only the last match was kept, so I think, FWIW, that on a purist point of vue the functionality would make sense in the stdlib in some way or another.
re2 can be used as a limited structural parser. This makes the re module useful for more things than it is currently. The question of it being in the standard library, however, I think should be made based on the criteria used previously (whatever they were). - Josiah