beautifulsoup .vs tidy

uche.ogbuji at gmail.com uche.ogbuji at gmail.com
Sun Jul 2 22:28:30 EDT 2006


bruce wrote:
> hi paddy...
>
> that's exactly what i'm trying to accomplish... i've used tidy, but it seems
> to still generate warnings...
>
>  initFile -> tidy ->cleanFile -> perl app (using xpath/livxml)
>
> the xpath/linxml functions in the perl app complain regarding the file. my
> thought is that tidy isn't cleaning enough, or that the perl xpath/libxml
> functions are too strict!
>
> which is why i decided to see if anyone on the python side has
> experienced/solved this problem..

FWIW here's my usual approach:

http://copia.ogbuji.net/blog/2005-07-22/Beyond_HTM

Personally, I avoid Tidy.  I've too often seen it crash or hang on
really bad HTML.  TagSoup seems to be built like a tank.  I've also
never seen BeautifulSoup choke, but I don't use it as much as TagSoup.

--
Uche Ogbuji                               Fourthought, Inc.
http://uche.ogbuji.net                    http://fourthought.com
http://copia.ogbuji.net                   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/




More information about the Python-list mailing list