How make regex that means "contains regex#1 but NOT regex#2" ??

Reedick, Andrew jr9445 at ATT.COM
Tue Jul 1 11:06:49 EDT 2008


	

> -----Original Message-----
> From: python-list-bounces+jr9445=att.com at python.org [mailto:python-
> list-bounces+jr9445=att.com at python.org] On Behalf Of Reedick, Andrew
> Sent: Tuesday, July 01, 2008 10:07 AM
> To: seberino at spawar.navy.mil; python-list at python.org
> Subject: RE: How make regex that means "contains regex#1 but NOT
> regex#2" ??
>  
> Match 'foo.*bar', except when 'not' appears between foo and bar.
> 
> 
> import re
> 
> s = 'fooAAABBBbar'
> print "Should match:", s
> m = re.match(r'(foo(.(?!not))*bar)', s);
> if m:
> 	print m.groups()
> 
> print
> 
> s = 'fooAAAnotBBBbar'
> print "Should not match:", s
> m = re.match(r'(foo(.(?!not))*bar)', s);
> if m:
> 	print m.groups()
> 
> 
> == Output ==
> Should match: fooAAABBBbar
> ('fooAAABBBbar', 'B')
> 
> Should not match: fooAAAnotBBBbar
> 


Fixed a bug with 'foonotbar'.  Conceptually it breaks down into:

	First_half_of_Regex#1(not
Regex#2)(any_char_Not_followed_by_Regex#2)*Second_half_of_Regex#1

However, if possible, I would make it a two pass regex.  Match on
Regex#1, throw away any matches that then match on Regex#2.  A two pass
is faster and easier to code and understand.  Easy to understand == less
chance of a bug.  If you're worried about performance, then a) a
complicated regex may or may not be faster than two simple regexes, and
b) if you're passing that much data through a regex, you're probably I/O
bound anyway.


import re

ss = ('foobar', 'fooAAABBBbar', 'fooAAAnotBBBbar', 'fooAAAnotbar',
'foonotBBBbar', 'foonotbar')

for s in ss:
	print s,
	m = re.match(r'(foo(?!not)(?:.(?!not))*bar)', s);
	if m:
		print m.groups()
	else:
		print


== output ==
foobar ('foobar',)
fooAAABBBbar ('fooAAABBBbar',)
fooAAAnotBBBbar
fooAAAnotbar
foonotBBBbar
foonotbar

*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA621





More information about the Python-list mailing list