regex remove closest tag

MRAB python at mrabarnett.plus.com
Thu Nov 12 14:17:16 EST 2009


S.Selvam wrote:
> Hi all,
> 
> 
> 1) I need to remove the <a> tags which is just before the keyword(i.e 
> some_text2 ) excluding others.
> 
> 2) input string may or may not contain <a> tags.
> 
> 3) Sample input: 
>  
>     inputstr = """start <a href="some_url">some_text1</a> <a 
> href="">some_text2</a> keyword anything"""
> 
> 4) I came up with the following regex,
> 
>    
> p=re.compile(r'(?P<good1>.*?)(\s*<a.*?</a>keyword|\s*keyword)(?P<good2>.*)',re.DOTALL|re.I)
>    s=p.search(inputstr)
>   but second group matches both <a> tags,while  i need to match the 
> recent one only.
> 
> I would like to get your suggestions.
> 
> Note:
> 
>    If i leave group('good1') as greedy, then it matches both the <a> tag.
> 
".*?" can match any number of any character, so it can match any
intervening "<a>" tags. Try "[^<]*?" instead.




More information about the Python-list mailing list