sre handling of groups in repeats

Greg Chapman glc at well.com
Mon Mar 31 15:17:22 EST 2003


I was going to post this as a bug report on SourceForge, but since I'm not
entirely sure what the correct behavior should be, I thought I'd post it here
first to get some comments.

In 2.22 and 2.3a2, I get the following for these patterns:

1) re.match('((a)|b)*', 'abc').groups()
('b', '')

2) re.match('((a)|b)*?c', 'abc').groups()
('b', '')

3) re.match('((?=(a)?)[ab])*', 'abc').groups()
('b', 'a')

For patterns 1 and 2, the other regex engines I've tried (Python's pre, Perl
5.6, Windows JScript, Dotnet) all agree that group 2 should be 'a'.  For pattern
3, Perl 5.6 reports group 2 as undefined; the others all agree with sre that it
should be 'a'.

sre certainly seems wrong in the first two cases; the second group should either
be 'a' or None.  Although it makes more sense to me for it to be None, since all
the other engines say that it should be 'a', I suppose that's what a fix should
implement.  In the third case, it seems to me that Perl is correct that group 2
should be None.

So, anyone have any definitive answers on this (or pointers to same)?

Thanks.

---
Greg Chapman





More information about the Python-list mailing list