beautifulsoup .vs tidy
uche.ogbuji at gmail.com
uche.ogbuji at gmail.com
Sun Jul 2 22:28:30 EDT 2006
bruce wrote:
> hi paddy...
>
> that's exactly what i'm trying to accomplish... i've used tidy, but it seems
> to still generate warnings...
>
> initFile -> tidy ->cleanFile -> perl app (using xpath/livxml)
>
> the xpath/linxml functions in the perl app complain regarding the file. my
> thought is that tidy isn't cleaning enough, or that the perl xpath/libxml
> functions are too strict!
>
> which is why i decided to see if anyone on the python side has
> experienced/solved this problem..
FWIW here's my usual approach:
http://copia.ogbuji.net/blog/2005-07-22/Beyond_HTM
Personally, I avoid Tidy. I've too often seen it crash or hang on
really bad HTML. TagSoup seems to be built like a tank. I've also
never seen BeautifulSoup choke, but I don't use it as much as TagSoup.
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/
More information about the Python-list
mailing list