HTMLParser can't read japanese

John Nagle nagle at animats.com
Tue Apr 13 14:51:07 EDT 2010


    Yes.  Try "cmd /u" to get a Unicode console.

    HTMLparser should already have converted from Shift-JIS
to Unicode, so the "print" is outputting Unicode.

				John Nagle

Stefan Behnel wrote:
> Dodo, 13.04.2010 13:40:
>> Here's a small script to generate again the error
>> running windows 7 with python 3.1
>>
>> FILE : parseShift.py
>>
>> import urllib.request as url
>> from html.parser import HTMLParser
>>
>> class myParser(HTMLParser):
>>   def handle_starttag(self, tag, attrs):
>>     print("Start of %s tag : %s" % (tag, attrs))
> 
> You problem is the last line. Your terminal does not support printing 
> the text, so you get an exception here.
> 
> Either change your terminal encoding to a suitable encoding, or write 
> the text to an encoded file instead (see the 'encoding' option of the 
> open() function for that).
> 
> Stefan
> 



More information about the Python-list mailing list