Question: Optional Regular Expression Grouping
Vlastimil Brom
vlastimil.brom at gmail.com
Mon Oct 10 18:59:46 EDT 2011
2011/10/10 galyle <galyle at gmail.com>:
> HI, I've looked through this forum, but I haven't been able to find a
> resolution to the problem I'm having (maybe I didn't look hard enough
> -- I have to believe this has come up before). The problem is this:
> I have a file which has 0, 2, or 3 groups that I'd like to record;
> however, in the case of 3 groups, the third group is correctly
> captured, but the first two groups get collapsed into just one group.
> I'm sure that I'm missing something in the way I've constructed my
> regular expression, but I can't figure out what's wrong. Does anyone
> have any suggestions?
>
> The demo below showcases the problem I'm having:
>
> import re
>
> valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
> [\']+.*$')
> line1 = "[field1][field2] = blarg"
> line2 = " 'a continuation of blarg'"
> line3 = "[field1][field2][field3] = blorg"
>
> m = valid_line.match(line1)
> print 'Expected: ' + m.group(1) + ', ' + m.group(2)
> m = valid_line.match(line2)
> print 'Expected: ' + str(m.group(1))
> m = valid_line.match(line3)
> print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)
> --
> http://mail.python.org/mailman/listinfo/python-list
>
Hi,
I believe, the space before = is causing problems (or the pattern missing it);
you also need non greedy quantifiers +? to match as little as possible
as opposed to the greedy default:
valid_line = re.compile('^\[(\S+?)\]\[(\S+?)\](?:\s+|\[(\S+)\])\s*=|\s+[\d\[\']+.*$')
or you can use word-patterns explicitely excluding the closing ], like:
valid_line = re.compile('^\[([^\]]+)\]\[([^\]]+)\](?:\s+|\[([^\]]+)\])\s*=|\s+[\d\[\']+.*$')
hth
vbr
More information about the Python-list
mailing list