[New-bugs-announce] [issue14251] [PATCH]HTMLParser decode issue

rednaks report at bugs.python.org
Sun Mar 11 03:23:15 CET 2012

New submission from rednaks <salexandre.bm at gmail.com>:

Hello !
while parsing a HTML code i got an decode Error :

but this issue can be fixed by replacing  the last string by s.decode() like in
the diff file.
I also tried to execute my script under python3.2 and it does not parsing any thing 

  File "/usr/lib/python2.7/HTMLParser.py", line 111, in feed
  File "/usr/lib/python2.7/HTMLParser.py", line 155, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.7/HTMLParser.py", line 260, in parse_starttag
    attrvalue = self.unescape(attrvalue)
  File "/usr/lib/python2.7/HTMLParser.py", line 410, in unescape
    return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
  File "/usr/lib/python2.7/re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x97 in position 1: ordinal
not in range(128)

components: Library (Lib)
files: patch.txt
messages: 155366
nosy: rednaks
priority: normal
severity: normal
status: open
title: [PATCH]HTMLParser decode issue
type: crash
versions: Python 2.7, Python 3.2
Added file: http://bugs.python.org/file24780/patch.txt

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list