regex remove closest tag
S.Selvam
s.selvamsiva at gmail.com
Fri Nov 13 00:18:31 EST 2009
On Fri, Nov 13, 2009 at 12:47 AM, MRAB <python at mrabarnett.plus.com> wrote:
> S.Selvam wrote:
>
>> Hi all,
>>
>>
>> 1) I need to remove the <a> tags which is just before the keyword(i.e
>> some_text2 ) excluding others.
>>
>> 2) input string may or may not contain <a> tags.
>>
>> 3) Sample input: inputstr = """start <a
>> href="some_url">some_text1</a> <a href="">some_text2</a> keyword anything"""
>>
>> 4) I came up with the following regex,
>>
>>
>> p=re.compile(r'(?P<good1>.*?)(\s*<a.*?</a>keyword|\s*keyword)(?P<good2>.*)',re.DOTALL|re.I)
>> s=p.search(inputstr)
>> but second group matches both <a> tags,while i need to match the recent
>> one only.
>>
>> I would like to get your suggestions.
>>
>> Note:
>>
>> If i leave group('good1') as greedy, then it matches both the <a> tag.
>>
>> ".*?" can match any number of any character, so it can match any
> intervening "<a>" tags. Try "[^<]*?" instead.
>
>
Thanks a lot,
p=re.compile(r'(?:<a[^<]*?<\/a>\s*%s)'%(keyword),re.I|re.S) has done
it !
--
> http://mail.python.org/mailman/listinfo/python-list
>
--
Yours,
S.Selvam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20091113/bc80a6c7/attachment-0001.html>
More information about the Python-list
mailing list