[Tutor] Regex

tpc at csua.berkeley.edu tpc at csua.berkeley.edu
Mon Aug 25 10:50:48 EDT 2003


hi Andrei, my first thought is to use the negative lookahead, which
consists of a pattern you are searching for, followed by a (?!) group
which, if matched, will result in the regexp not matching.  Example:

<paste>
>>> testsearch = re.compile('tetsuro(?!hello)', re.IGNORECASE)
>>> testsearch.search('tetsurohello')
>>> testsearch.search('TETSUROhello')
>>> testsearch.search('hitetsuroone')
<_sre.SRE_Match object at 0x860e4a0>
</paste>

So you are searching for the string 'tetsuro' and you want a match unless
your string is followed by another string 'hello.'  Simple enough, and
from reading amk's site on re it will only get simpler.  If the negative
lookahead condition is matched, then the re will not match.

What's confusing about negative lookaheads is that in practice, what I've
seen does not seem to hold true.  For example, from Danny Yoo's 'A
Quick Introduction to Python', a regular expression that can recognize
most http urls the regex matches but removes the extraneous material:

<paste>
>>> myre = re.compile(r'http://[\w\.-]+\.?(?![\w.-/])', re.IGNORECASE)
>>> myre.search('http://python.org')
<_sre.SRE_Match object at 0x83db1e0>
>>> myre.search('<a href="http://python.org">Python</a>')
<_sre.SRE_Match object at 0x83deb70>
>>> myre.search('<a href="http://python.org">Python</a>').group(0)
'http://python.org'
</paste>

The 'r' in front of the quotes is called a "raw string" and turns off the
special meaning of the backslash.  Now I hope you are as interested and as
confused as I am.


On Sun, 24 Aug 2003, Andrei wrote:

> I'm quite sure I've seen a question of this type before, but I seem
> unable to find it. How can I match a re pattern ONLY if it is not
> preceded by another re pattern? I know how to construct both individual
> regexes, but I don't know how to glue them together in order to achieve
> this.
>
> Think for example of finding all URLs in a piece of text, but *not* if
> they are inside link tags and therefore preceded by 'href="'.
> <a href="http://python.org">Python</a> shouldn't give a match, but
> http://python.org on its own should.
>
> Andrei
>
> =====
> Mail address in header catches spam. Real contact info (decode with rot13):
> cebwrpg5 at bcrenznvy.pbz. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V
> ernq gur yvfg, fb gurer'f ab arrq gb PP.
>
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>




More information about the Tutor mailing list