HTML purifier using BeautifulSoup?
Dan Stromberg
strombrg at dcs.nac.uci.edu
Tue Dec 21 13:10:37 EST 2004
Has anyone tried to construct an HTML janitor script using BeautifulSoup?
My situation:
I'm trying to convert a series of web pages from .html to palmdoc format,
using plucker, which is written in python. The plucker project suggests
passing html through "tidy", to get well-formed html for plucker to work
with.
However, some of the pages I want to convert are so bad that even tidy
pukes on them.
I was thinking that BeautifulSoup might be more tolerant of really bad
html... Which led me to the question this article started out with. :)
Thanks!
More information about the Python-list
mailing list