[issue6611] HTMLParser cannot deal with mixture of arbitrary data and character reference

Liu DongMiao report at bugs.python.org
Fri Jul 31 09:45:54 CEST 2009


New submission from Liu DongMiao <liudongmiao at gmail.com>:

HTMLParser (Python 2.6.2) Cannot deal with mixture of arbitrary data and
character reference. 

In line 365-373, replaceEntities(s) returns unichr(charref) in unicode,
which cannot be a mixture with arbitrary data in str.

A fix way: replace unichr(c) with unichr(c).encode('utf-8').

----------
components: Library (Lib)
files: chinese.py
messages: 91128
nosy: liudongmiao at gmail.com
severity: normal
status: open
title: HTMLParser cannot deal with mixture of arbitrary data and character reference
type: compile error
versions: Python 2.6
Added file: http://bugs.python.org/file14613/chinese.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6611>
_______________________________________


More information about the Python-bugs-list mailing list