[Python-Dev] hierarchicial named groups extension to the re library

Gustavo Niemeyer gustavo at niemeyer.net
Mon Apr 4 03:17:17 CEST 2005

Greetings Chris,

> Well, that would be something I'd want to discuss here.  As I'm not
> sure if I actually ~want~ to match the API of the re module.

If this feature is considered a good addition for the standard
library, integrating it on re would be an interesting option.
But given what you say above, I'm not sure if *you* want to
make it a part of re itself.

> IMO If you don't bother to name a group then you probably aren't going
> to be interested in it anyway - so why keeping a reference to it?

That's not true. There's a lot of code out there using unnamed
groups genuinely. The syntax (?: ) is used when the group content
is considered unuseful.

> If you only wanted to extract the numbers from those verses...
> >>> regex='^(((?P<number>\d+) ([^,]+))(, )?)*$'
> >>> pat2=re2.compile(regex)
> >>> x=pat2.extract(buf)
> >>> x
> {'number': ['12', '11', '10']}
> Before the compression stage the _Match object actually looked like this:
> {'_group0': {'_value': '12 drummers drumming, 11 pipers piping, 10
> lords
> '10'}}]}}
> But the compression algorithm collected the named groups and brought
> them to the surface, to return the much nicer looking:
> {'number': ['12', '11', '10']}

I confess I didn't thought about how that could be cleanly
implemented, but both outputs you present above look inadequate
in my opinion. Regular expressions already have a widely adopted
meaning. If we're going to introduce new features, we should try
to do that without breaking the current well known meanings they

> > I find the feature very interesting, but being used to live without it,
> > I have difficulty evaluating its usefulness.
> Yes - this is a good point too, because it ~is~ different from the re
> library.  re2 aims to do all that searching, grouping, iterating and
> collecting and constructing work for you.
> Actually, I ~would~ like to limit it to just named groups.
> I reckon, if you're not going to bother naming a group, then why would
> you have any interest in it.
> I guess its up for discussion how confusing this "new" way of thinking
> could be and what drawbacks it might have.

Your target seems to be a new kind of regular expressions indeed.
In that case, I'm not sure if "re2" is the right name for it, given
that you haven't written an improved SRE, but a completely new
kind of regular expression matching which depends on SRE itself
rather than extending it on a compatible way.

While I would like to see *some* kind of successive matching
implemented in SRE (besides the Scanner which is already available),
I'm not in favor of that specific implementation.

I'm open to discuss that further.

Gustavo Niemeyer

More information about the Python-Dev mailing list