Removing an attribute from html with Regex
Selvam
s.selvamsiva at gmail.com
Thu Dec 30 02:30:58 EST 2010
Hi all,
I have some HTML string which I would like to feed to BeautifulSoup.
But, One malformed attribute breaks BeautifulSoup.
<p style='terp_header' wrong_tag=' text1 ' text2 ' and 'para' '
class='terp_header'> My String</p>
I would like it to replace all the occurances of that attribute with an
empty string.
I am unable to figure out the exact regex, which can do this job.
This is what, I have managed so far,
m = re.compile("rml_except='([^']*)")
As you see, it will stop at the first occurance of single quote.
Any suggestions will be useful.
--
Regards,
S.Selvam
SG E-ndicus Infotech Pvt Ltd.
http://e-ndicus.com/
" I am because we are "
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20101230/1d936b2e/attachment-0001.html>
More information about the Python-list
mailing list