save gb-2312 web page in a .html file
Peter Pei
yantao at telus.com
Wed Dec 26 17:52:57 EST 2007
I am trying to read a web page and save it in a .html file. The problem is
that the web page is GB-2312 encoded, and I want to save it to the file with
the same encoding or unicode. I have some code like this:
url = 'http://blah/'
headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows
NT)' }
req = urllib2.Request(url, None, headers)
page = urllib2.urlopen(req).read()
file = open('btchina.html','wb')
file.write(page.encode('gb-2312'))
file.close()
It is obviously not working, and I am hoping someone can help me.
More information about the Python-list
mailing list