<br><br><div class="gmail_quote">On Fri, Nov 13, 2009 at 12:47 AM, MRAB <span dir="ltr"><<a href="mailto:python@mrabarnett.plus.com">python@mrabarnett.plus.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<div><div></div><div class="h5">S.Selvam wrote:<br>


<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


Hi all,<br>


<br>


<br>


1) I need to remove the <a> tags which is just before the keyword(i.e some_text2 ) excluding others.<br>


<br>


2) input string may or may not contain <a> tags.<br>


<br>


3) Sample input:      inputstr = """start <a href="some_url">some_text1</a> <a href="">some_text2</a> keyword anything"""<br>


<br>


4) I came up with the following regex,<br>


<br>


   p=re.compile(r'(?P<good1>.*?)(\s*<a.*?</a>keyword|\s*keyword)(?P<good2>.*)',re.DOTALL|re.I)<br>


   s=p.search(inputstr)<br>


  but second group matches both <a> tags,while  i need to match the recent one only.<br>


<br>


I would like to get your suggestions.<br>


<br>


Note:<br>


<br>


   If i leave group('good1') as greedy, then it matches both the <a> tag.<br>


<br>


</blockquote></div></div>


".*?" can match any number of any character, so it can match any<br>


intervening "<a>" tags. Try "[^<]*?" instead.<br><font color="#888888">


<br></font></blockquote><div><br>Thanks a lot,<br> <br>     p=re.compile(r'(?:<a[^<]*?<\/a>\s*%s)'%(keyword),re.I|re.S)    has done it !<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<font color="#888888">


-- <br></font><div><div></div><div class="h5">


<a href="http://mail.python.org/mailman/listinfo/python-list" target="_blank">http://mail.python.org/mailman/listinfo/python-list</a><br>


</div></div></blockquote></div><br><br clear="all"><br>-- <br>Yours,<br>S.Selvam<br>