[issue7626] Entity references without semicolon in HTMLParser

R. David Murray report at bugs.python.org
Tue Jan 5 21:13:32 CET 2010


R. David Murray <rdmurray at bitdance.com> added the comment:

w3m (a text mode browser) does not treat the &eacute without the ; as an entity ref (it puts &eacute literally into the display), while firefox does turn it into an eacute with or without the ;.  I'm sure somebody somewhere has a table listing which browsers have what behavior. 

Firefox does render, eg, &test without a trailing semi as &test.  If you want to mirror that result in code using HTMLParser, you can implement the behavior in your entityref handler.

However, this brings up an interesting issue.  Firefox also renders "&test;" literally.  You can't implement that full behavior using HTMLParser, as far as I can see, since you loose the information as to whether the entity ref was terminated by a semicolon or not. So there may be a legitimate feature request with respect to that issue.

----------
nosy: +r.david.murray

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7626>
_______________________________________


More information about the Python-bugs-list mailing list