[issue14251] HTMLParser decode issue

Tue Mar 13 01:11:31 CET 2012

Ezio Melotti <ezio.melotti at gmail.com> added the comment:

I test this again and indeed a bare s.decode() is not enough to fix the problem.  The attribute might contain non-ascii characters, and that will result in an error (see for example the "test.py" script attached to #3932).  The correct solution is to decode the page before passing it to the parser.

----------
resolution:  -> duplicate
stage:  -> committed/rejected
status: open -> closed
superseder:  -> HTMLParser cannot handle '&' and non-ascii characters in attribute names
versions:  -Python 3.2

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14251>
_______________________________________