How make regex that means "contains regex#1 but NOT regex#2" ??

Vlastimil Brom vlastimil.brom at gmail.com
Tue Jul 1 07:11:12 EDT 2008


2008/7/1, seberino at spawar.navy.mil <seberino at spawar.navy.mil>:
>
> I'm looking over the docs for the re module and can't find how to
> "NOT" an entire regex.
>
> For example.....
>
> How make regex that means "contains regex#1 but NOT regex#2" ?
>
> Chris
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
Maybe I'm missing something, but the negative lookahead seems to do roughly
that (at least for simpler cases); the usual form is the check the text
after the match, but it can also be used at the beginning of the pattern.

ie. (?!regex#2)regex#1

e.g. the following should search for "words" discarding some very frequent
ones; the important part seems to be keeping the excluding regexp2
compatible with the matching regex#1.

(?!\b(?:an?|the|is|are|of|in|to|and)\b)\b\w+\b

(without the checks for word boundaries \b, this pattern would also exclude
"words" only partly containing the stopwords)

regards

  vbr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080701/ed69d021/attachment.html>


More information about the Python-list mailing list