[issue1486713] HTMLParser : A auto-tolerant parsing mode

R. David Murray report at bugs.python.org
Tue Aug 24 15:13:41 CEST 2010


R. David Murray <rdmurray at bitdance.com> added the comment:

2.6 is now in security-fix-only mode.  Since this is a new feature, it can only go into 3.2.

Can you provide a patch against py3k trunk?

I've only glanced at the patch briefly, but one thing that concerns me is 'warning file'.  I suppose that either the logging module or perhaps the warnings module should be used instead.  We should look at how other stdlib modules handle this kind of thing.  Or perhaps warnings shouldn't be generated at all, since the default will be strict and therefore the programmer has consciously selected tolerant mode.

One stdlib model we could follow is the model of the email module: have a 'defects' attribute that collects the errors.  email6, by the way, is going to have both 'tolerant' and 'strict' modes, and in that case the default is tolerant (and always has been) in respect for Postel's law, which is enshrined in the email RFCs.  If the HTTP standards have a similar recommendation to accept "dirty" input when possible, we could make an argument for changing HTMLParser's default to tolerant.

----------
versions:  -Python 2.6, Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1486713>
_______________________________________


More information about the Python-bugs-list mailing list