[New-bugs-announce] [issue14251] [PATCH]HTMLParser decode issue
rednaks
report at bugs.python.org
Sun Mar 11 03:23:15 CET 2012
New submission from rednaks <salexandre.bm at gmail.com>:
Hello !
while parsing a HTML code i got an decode Error :
but this issue can be fixed by replacing the last string by s.decode() like in
the diff file.
I also tried to execute my script under python3.2 and it does not parsing any thing
File "/usr/lib/python2.7/HTMLParser.py", line 111, in feed
self.goahead(0)
File "/usr/lib/python2.7/HTMLParser.py", line 155, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.7/HTMLParser.py", line 260, in parse_starttag
attrvalue = self.unescape(attrvalue)
File "/usr/lib/python2.7/HTMLParser.py", line 410, in unescape
return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
File "/usr/lib/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x97 in position 1: ordinal
not in range(128)
----------
components: Library (Lib)
files: patch.txt
messages: 155366
nosy: rednaks
priority: normal
severity: normal
status: open
title: [PATCH]HTMLParser decode issue
type: crash
versions: Python 2.7, Python 3.2
Added file: http://bugs.python.org/file24780/patch.txt
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14251>
_______________________________________
More information about the New-bugs-announce
mailing list