[Tutor] Question regular expressions - the non-greedy pattern

Tue Jan 22 01:39:58 CET 2013

Hi Marcin,

On 21 January 2013 23:11, Marcin Mleczko <Marcin.Mleczko at onet.eu> wrote:

> first thank you very much for the quick reply.
>
No problem...

> The functions used here i.e. re.match() are taken directly form the
> example in the mentioned HowTo. I'd rather use re.findall() but I
> think the general interpretetion of the given regexp sould be nearly
> the same in both functions.
>

... except that the results are fundamentally different due to the
different goals for the 2 functions: the one (match) only matches a regex
from the first character of a string.  (No conceptual "walking forward"
unless you've managed to match the string to a regex.)  The other (find),
matches the first possible match (conceptually walking the starting point
forward only as far as necessary to find a possible match.)

> So I'd like to neglect the choise of a particular function for a
> moment a concentrate on the pure theory.
> What I got so far:
> in theory form s = '<<html><head><title>Title</title>'
> '<.*?>' would match '<html>' '<head>' '<title>' '</title>'
> to achieve this the engine should:
> 1. walk forward along the text until it finds <
> 2. walk forward from that point until in finds >
>

Here, conceptually the regex engines work for your original regex is
complete and it returns a match.

> 3. walk backward form that point (the one of >) until it finds <
>

No.  No further walking backward when you've already matched the regex.

4. return the string between < from 3. and > from 2. as this gives the
> least possible string between < and >
>

"Non greedy" doesn't imply the conceptually altering the starting point in
a backwards manner after you've already found a match.

> Did I get this right so far? Is this (=least possible string between <
> and >), what non-greedy really translates to?
>

No, as explained above.

Walter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20130122/2450b83c/attachment.html>