[OT] a little about regex

Wed Oct 18 03:32:39 EDT 2006

Fulvio wrote:
> ***********************
> Your mail has been scanned by InterScan MSS.
> ***********************
> 
> 
> Hello,
> 
> I'm trying to get working an assertion which filter address from some domain 
> but if it's prefixed by '.com'.
> Even trying to put the result in a negate test I can't get the wanted result.
> 
> The tought in program term :
> 
>>>> def filter(adr):
> ...     import re
> ...     allow = re.compile('.*\.my(>|$)')
> ...     deny = re.compile('.*\.com\.my(>|$)')
> ...     cnt = 0
> ...     if deny.search(adr): cnt += 1
> ...     if allow.search(adr): cnt += 1
> ...     return cnt
> ...
>>>> filter('some.ads at lazyfox.com.my')
> 2
>>>> filter('some.ads at lazyfox.net.my')
> 1
> 
> Seem that I miss some better regex implementation to avoid that both of the 
> filters taking action. I'm thinking of lookbehind (negative or positive) 
> option, but I think I couldn't realize it yet.
> I think the compilation should either allow have no '.com' before '.my' or 
> deny should have _only_ '.com' before '.my'. Sorry I don't get the correct 
> sintax to do it.
> 
> Suggestions are welcome.
> 
> F

Instead of using two separate if's, Use an if - elif and be sure to test the 
narrower filter first.  (You have them in the correct order) That way it will 
skip the more general filter and not increment cnt twice.

It's not exactly clear on what output you are seeking.  If you want 0 for not 
filtered and 1 for filtered, then look to Freds Hint.

Or are you writing a test at the moment, a 1 means it only passed one filter so 
you know your filters are working as designed?

Another approach would be to assign values for filtered, accepted, and undefined 
and set those accordingly instead of incrementing and decrementing a counter.

Cheers,
   Ron