[Tutor] Optional groups in RE's

Moos Heintzen iwasroot at gmail.com
Sun Apr 12 00:46:18 CEST 2009

Hello Tutors!

I was trying to make some groups optional in a regular expression, but
I couldn't do it.

For example, I have the string:

>>> data = "<price>42</price> sdlfks d f<ship>60</ship> sdf sdf  <title>Title</title>"

and the pattern:
>>> pattern = "<price>(.*?)</price>.*?<ship>(.*?)</ship>.*?<title>(.*?)</title>"

This works when all the groups are present.

>>> re.search(pattern, data).groups()
('42', '60', 'Title')

However, I don't know how to make an re to deal with possibly missing groups.
For example, with:
>>> data = "<price>42</price> sdlfks d f<ship>60</ship> sdf sdf"

I tried
>>> pattern = "<price>(.*?)</price>.*?<ship>(.*?)</ship>.*?(?:<title>(.*?)</title>)?"
>>> re.search(pattern, data).groups()
('42', '60', None)

but it doesn't work when <title> _is_ present.

>>> data = "<price>42</price> sdlfks d f<ship>60</ship> sdf sdf  <title>Title</title>"
>>> re.search(pattern, data).groups()
('42', '60', None)

I tried something like (?:pattern)+ and (?:pattern)* but I couldn't
get what I wanted.
(.*?)? doesn't seem to be a valid re either.

I know (?:pattern) is a non-capturing group.
I just read that | has very low precedence, so I used parenthesis
liberally to "or" pattern and a null string.

>>> pattern = "<price>(.*?)</price>.*?<ship>(.*?)</ship>.*?(?:(?:<title>(.*?)</title>)|)"
>>> re.search(pattern, data).groups()
('42', '60', None)

(?:(?:pattern)|(?:.*)) didn't work either.

I want to be able to have some groups as optional, so when that group
isn't matched, it returns None. When it's match it should return what
is matched.

Is that possible with one re?

I could probably do it with more than one re (and did it) but with one
re the solution is much more elegant.
(i.e. I could have named groups, then pass the resultant dictionary to
a processing function)

I also tried matching optional groups before, and curious about the solution.


More information about the Tutor mailing list