Hi,<div><br></div><div>I am using a simple sublclass of HTMLParser like this:</div><div><br></div><div><div>class LinkCollector(HTMLParser):</div><div><br></div><div> def reset(self):</div><div> self.links = []</div>
<div> HTMLParser.reset(self)</div><div><br></div><div> def handle_starttag(self,tag,attr):</div><div> if tag in ("a","link"):</div><div> key = "href"</div><div> elif tag in ("img","script"):</div>
<div> key = "src"</div><div> else:</div><div> return</div><div> self.links.extend([v for k,v in attr if k == key])</div><div><br></div><div>This gives following error:</div><div>
<br></div><div><div>Traceback (most recent call last):</div><div> File "downloader.py", line 209, in <module></div><div> if __name__ == "__main__": main()</div><div> File "downloader.py", line 201, in main</div>
<div> link_collect.feed(response)</div><div> File "C:\Python27\lib\HTMLParser.py", line 108, in feed</div><div> self.goahead(0)</div><div> File "C:\Python27\lib\HTMLParser.py", line 148, in goahead</div>
<div> k = self.parse_starttag(i)</div><div> File "C:\Python27\lib\HTMLParser.py", line 252, in parse_starttag</div><div> attrvalue = self.unescape(attrvalue)</div><div> File "C:\Python27\lib\HTMLParser.py", line 393, in unescape</div>
<div> return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)</div><div> File "C:\Python27\lib\re.py", line 151, in sub</div><div> return _compile(pattern, flags).sub(repl, string, count)</div>
<div>UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 13: ordinal not in range(128)</div></div><div><br></div><div>Rest of the code available as attachment. Does anyone know how to solve this?</div>
<div><br></div>-- <br><a href="http://yasar.serveblog.net/" target="_blank">http://yasar.serveblog.net/</a><br><br>
</div>