[issue6191] HTMLParser attribute parsing - 2 test cases when it fails
R. David Murray
report at bugs.python.org
Thu Jun 4 23:50:27 CEST 2009
R. David Murray <rdmurray at bitdance.com> added the comment:
In doing web scraping I started using BeautifulSoup precisely because it
was very lenient in what html it accepted (I haven't written such an ap
for a while, so I'm not sure what BeautifulSoup currently does...I
thought I heard it was now using HTMLParser...).
There are a lot of messed up web pages out there.
I don't have time right now to evaluate your particular cases, but my
rule of thumb would be that if the major web browsers do something
"reasonable" with these cases, then a python tool designed to read web
pages should do so as well, where possible. ("Be liberal in what you
accept, and strict in what you generate.")
That said, I'm not sure what HTMLParser's design goals are, so this may
not be an appropriate goal for the module.
----------
nosy: +r.david.murray
priority: -> normal
status: pending -> open
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6191>
_______________________________________
More information about the Python-bugs-list
mailing list