Removing certain tags from html files
sebzzz at gmail.com
sebzzz at gmail.com
Fri Jul 27 14:50:49 EDT 2007
>
> Than take a hold on the content and add it to the parent. Somthing like
> this should work:
>
> from BeautifulSoup import BeautifulSoup
>
> def remove(soup, tagname):
> for tag in soup.findAll(tagname):
> contents = tag.contents
> parent = tag.parent
> tag.extract()
> for tag in contents:
> parent.append(tag)
>
> def main():
> source = '<a><b>This is a <c>Test</c></b></a>'
> soup = BeautifulSoup(source)
> print soup
> remove(soup, 'b')
> print soup
>
> > Is re the good module for that? Basically, if I make an iteration that
> > scans the text and tries to match every occurrence of a given regular
> > expression, would it be a good idea?
>
> No regular expressions are not a very good idea. They get very
> complicated very quickly while often still miss some corner cases.
>
Thanks a lot for that.
It's true that regular expressions could give me headaches (especially
to find where the tag ends).
More information about the Python-list
mailing list