<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

  <title></title>

</head>

<body bgcolor="#ffffff" text="#000000">

George Burdell wrote:

<blockquote

 cite="mid:b0a74686-5f27-459b-84d3-8eef55ed9e01@m11g2000yqf.googlegroups.com"

 type="cite">

  <pre wrap="">On Sep 6, 10:06 pm, "Mark Tolonen" <a class="moz-txt-link-rfc2396E" href="mailto:metolone+gm...@gmail.com"><metolone+gm...@gmail.com></a> wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap=""><a class="moz-txt-link-rfc2396E" href="mailto:gburde...@gmail.com"><gburde...@gmail.com></a> wrote in message

<a class="moz-txt-link-freetext" href="news:f98a6057-c35f-4843-9efb-7f36b05b677c@g19g2000yqo.googlegroups.com">news:f98a6057-c35f-4843-9efb-7f36b05b677c@g19g2000yqo.googlegroups.com</a>...

    </pre>

    <blockquote type="cite">

      <pre wrap="">If I do this:

      </pre>

    </blockquote>

    <blockquote type="cite">

      <pre wrap="">import re

a=re.search(r'hello.*?money',  'hello how are you hello funny money')

      </pre>

    </blockquote>

    <blockquote type="cite">

      <pre wrap="">I would expect a.group(0) to be "hello funny money", since .*? is a

non-greedy match. But instead, I get the whole sentence, "hello how

are you hello funny money".

      </pre>

    </blockquote>

    <blockquote type="cite">

      <pre wrap="">Is this expected behavior? How can I specify the correct regexp so

that I get "hello funny money" ?

      </pre>

    </blockquote>

    <pre wrap="">A non-greedy match matches the fewest characters before matching the text

*after* the non-greedy match.  For example:

    </pre>

    <blockquote type="cite">

      <blockquote type="cite">

        <blockquote type="cite">

          <pre wrap="">import re

a=re.search(r'hello.*?money','hello how are you hello funny money and

more money')

a.group(0)  # non-greedy stops at the first money

          </pre>

        </blockquote>

      </blockquote>

    </blockquote>

    <pre wrap="">'hello how are you hello funny money'>>> a=re.search(r'hello.*money','hello how are you hello funny money and

    </pre>

    <blockquote type="cite">

      <blockquote type="cite">

        <blockquote type="cite">

          <pre wrap="">more money')

a.group(0)  # greedy keeps going to the last money

          </pre>

        </blockquote>

      </blockquote>

    </blockquote>

    <pre wrap="">'hello how are you hello funny money and more money'

This is why it is difficult to use regular expressions to match nested

objects like parentheses or XML tags.  In your case you'll need something

extra to not match the first hello.

    </pre>

    <blockquote type="cite">

      <blockquote type="cite">

        <blockquote type="cite">

          <pre wrap="">a=re.search(r'(?<!^)hello.*?money','hello how are you hello funny

money')

a.group(0)

          </pre>

        </blockquote>

      </blockquote>

    </blockquote>

    <pre wrap="">'hello funny money'

-Mark

    </pre>

  </blockquote>

  <pre wrap=""><!---->

I see now. I also understand r's response. But what if there are many

"hello"'s before "money," and I don't know how many there are? In

other words, I want to find every occurrence of "money," and for each

occurrence, I want to scan back to the first occurrence of "hello."

How can this be done?

  </pre>

</blockquote>

<br>

This is asking for more power then regular expressions can support.<br>

<br>

However, your request reads like an algorithm.  Search for an

occurrence of "hello" (using the find string method), and search

backwards from there for "money" (use rfind string method).  Two lines

of code in a loop should do it.<br>

<br>

<br>

Gary Herron<br>

<br>

<br>

<br>

</body>

</html>