save gb-2312 web page in a .html file

Peter Pei yantao at telus.com
Wed Dec 26 17:52:57 EST 2007


I am trying to read a web page and save it in a .html file. The problem is 
that the web page is GB-2312 encoded, and I want to save it to the file with 
the same encoding or unicode. I have some code like this:
    url = 'http://blah/'
    headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 
NT)' }

    req = urllib2.Request(url, None, headers)
    page = urllib2.urlopen(req).read()

    file = open('btchina.html','wb')
    file.write(page.encode('gb-2312'))
    file.close()

It is obviously not working, and I am hoping someone can help me.




More information about the Python-list mailing list