[pypy-issue] [issue1082] [PATCH]HTMLParser decode issue

rednaks tracker at bugs.pypy.org
Sat Mar 10 22:50:41 CET 2012


New submission from rednaks <salexandre.bm at gmail.com>:

while parsing a HTML code i got an decode Error :

but this issue can be fixed by replacing  the last string by s.decode() like in
the diff file.
I also tried to execute my script under python3.2 and it does not parsing any thing 

  File "/usr/lib/python2.7/HTMLParser.py", line 111, in feed
    self.goahead(0)
  File "/usr/lib/python2.7/HTMLParser.py", line 155, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.7/HTMLParser.py", line 260, in parse_starttag
    attrvalue = self.unescape(attrvalue)
  File "/usr/lib/python2.7/HTMLParser.py", line 410, in unescape
    return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
  File "/usr/lib/python2.7/re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x97 in position 1: ordinal
not in range(128)

----------
files: patch.txt
messages: 4057
nosy: pypy-issue, rednaks
priority: bug
status: unread
title: [PATCH]HTMLParser decode issue

________________________________________
PyPy bug tracker <tracker at bugs.pypy.org>
<https://bugs.pypy.org/issue1082>
________________________________________
-------------- next part --------------
diff -ru1 old/HTMLParser.py new/HTMLParser.py
--- old/HTMLParser.py	2012-03-10 23:40:16.000000000 +0100
+++ new/HTMLParser.py	2012-03-10 23:46:36.000000000 +0100
@@ -409,2 +409,2 @@
 
-        return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
+        return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s.decode())


More information about the pypy-issue mailing list