[Tutor] Optional groups in RE's

Kent Johnson kent37 at tds.net
Sun Apr 12 02:54:53 CEST 2009


On Sat, Apr 11, 2009 at 6:46 PM, Moos Heintzen <iwasroot at gmail.com> wrote:
> Hello Tutors!
>
> I was trying to make some groups optional in a regular expression, but
> I couldn't do it.
>
> For example, I have the string:
>
>>>> data = "<price>42</price> sdlfks d f<ship>60</ship> sdf sdf  <title>Title</title>"
>
> and the pattern:
>>>> pattern = "<price>(.*?)</price>.*?<ship>(.*?)</ship>.*?<title>(.*?)</title>"
>
> This works when all the groups are present.
>
>>>> re.search(pattern, data).groups()
> ('42', '60', 'Title')
>
> However, I don't know how to make an re to deal with possibly missing groups.
> For example, with:
>>>> data = "<price>42</price> sdlfks d f<ship>60</ship> sdf sdf"
>
> I tried
>>>> pattern = "<price>(.*?)</price>.*?<ship>(.*?)</ship>.*?(?:<title>(.*?)</title>)?"
>>>> re.search(pattern, data).groups()
> ('42', '60', None)
>
> but it doesn't work when <title> _is_ present.

This re doesn't have to match anything after </ship> so it doesn't.
You can force it to match to the end by adding $ at the end but that
is not enough, you have to make the "</ship>.*?" *not* match <title>.
One way to do that is to use [^<]>? instead of .*?:

In [1]: import re

In [2]: d1 = "<price>42</price> sdlfks d f<ship>60</ship> sdf sdf
<title>Title</title>"

In [3]: d2 = "<price>42</price> sdlfks d f<ship>60</ship> sdf sdf"

In [7]: pattern =
"<price>(.*?)</price>.*?<ship>(.*?)</ship>[^<]*?(?:<title>(.*?)</title>)?$"

In [8]: re.search(pattern, d1).groups()
Out[8]: ('42', '60', 'Title')

In [9]: re.search(pattern, d2).groups()
Out[9]: ('42', '60', None)

Kent


More information about the Tutor mailing list