[Python-Dev] hierarchicial named groups extension to the re library

Mon Apr 4 03:17:17 CEST 2005

Greetings Chris,

> Well, that would be something I'd want to discuss here.  As I'm not
> sure if I actually ~want~ to match the API of the re module.

If this feature is considered a good addition for the standard
library, integrating it on re would be an interesting option.
But given what you say above, I'm not sure if *you* want to
make it a part of re itself.

[...]
> IMO If you don't bother to name a group then you probably aren't going
> to be interested in it anyway - so why keeping a reference to it?

That's not true. There's a lot of code out there using unnamed
groups genuinely. The syntax (?: ) is used when the group content
is considered unuseful.

> If you only wanted to extract the numbers from those verses...
> 
> >>> regex='^(((?P<number>\d+) ([^,]+))(, )?)*$'
> >>> pat2=re2.compile(regex)
> >>> x=pat2.extract(buf)
> >>> x
> {'number': ['12', '11', '10']}
> 
> Before the compression stage the _Match object actually looked like this:
> 
> {'_group0': {'_value': '12 drummers drumming, 11 pipers piping, 10
> lords
[...]
> '10'}}]}}
> 
> But the compression algorithm collected the named groups and brought
> them to the surface, to return the much nicer looking:
> 
> {'number': ['12', '11', '10']}

I confess I didn't thought about how that could be cleanly
implemented, but both outputs you present above look inadequate
in my opinion. Regular expressions already have a widely adopted
meaning. If we're going to introduce new features, we should try
to do that without breaking the current well known meanings they
have.

> > I find the feature very interesting, but being used to live without it,
> > I have difficulty evaluating its usefulness.
> 
> Yes - this is a good point too, because it ~is~ different from the re
> library.  re2 aims to do all that searching, grouping, iterating and
> collecting and constructing work for you.
[...]
> Actually, I ~would~ like to limit it to just named groups.
> I reckon, if you're not going to bother naming a group, then why would
> you have any interest in it.
> I guess its up for discussion how confusing this "new" way of thinking
> could be and what drawbacks it might have.

Your target seems to be a new kind of regular expressions indeed.
In that case, I'm not sure if "re2" is the right name for it, given
that you haven't written an improved SRE, but a completely new
kind of regular expression matching which depends on SRE itself
rather than extending it on a compatible way.

While I would like to see *some* kind of successive matching
implemented in SRE (besides the Scanner which is already available),
I'm not in favor of that specific implementation.

I'm open to discuss that further.

-- 
Gustavo Niemeyer
http://niemeyer.net