[docs] [issue23144] html.parser.HTMLParser: setting 'convert_charrefs = True' leads to dropped text

Ezio Melotti report at bugs.python.org
Sat Mar 7 17:41:34 CET 2015


Ezio Melotti added the comment:

Here is a patch that fixes the problem.
Even though calling .close() is the correct solution, I preferred to restore the previous behavior and call handle_data as soon as possible.
There is a corner case in which a charref might be cut in half while feeding chunks to the parser -- in that case the parser will wait and it might still be necessary to call .close() if an incomplete charref is at the end of the string.
Adding context manager support to HTMLParser might also help solving the problem, but that's a separate issue.
(Also thanks to Serhiy for the feedback he provided me on IRC.)

----------
keywords: +patch
nosy: +serhiy.storchaka
stage:  -> commit review
versions: +Python 2.7, Python 3.5
Added file: http://bugs.python.org/file38376/issue23144.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23144>
_______________________________________


More information about the docs mailing list