Looking for a decent HTML parser for Python...
Stephen Eilert
spedrosa at gmail.com
Wed Dec 6 11:41:40 EST 2006
Fredrik Lundh escreveu:
> > Except it appears to be buggy or, at least, not very robust. There are
> > websites for which it falsely terminates early in the parsing.
>
> which probably means that the sites are broken. the amount of broken
> HTML on the net is staggering, as is the amount of code in a typical web
> browser for dealing with all that crap. for a more tolerant parser, see:
>
> http://www.crummy.com/software/BeautifulSoup/
>
> </F>
+1 for BeautifulSoup.
The documentation is quite brief and sometimes confusing, but I've
found it the easiest parser I've ever worked with.
Stephen
More information about the Python-list
mailing list