[Tutor] Problems with encoding in BeautifulSoup

Serdar Tumgoren zstumgoren at gmail.com
Tue Aug 18 15:37:20 CEST 2009


> Setting sys.setdefaultencoding() affects all scripts you run and will
> make scripts that you write non-portable. A better solution is to
> properly encode the output, for example
> for company in companies[:4]: # assuming companies is a list
>  print company.encode('cp437')
>
Kent's suggestion appears to be more in line with the recommended
method in the Beautiful Soup docs. Check out the below page:

http://www.crummy.com/software/BeautifulSoup/documentation.html#Why%20can%27t%20Beautiful%20Soup%20print%20out%20the%20non-ASCII%20characters%20I%20gave%20it?

Here's some sample code:

import codecs
import sys
streamWriter = codecs.lookup('utf-8')[-1]
sys.stdout = streamWriter(sys.stdout)

You might also want to spend some time reading up on encoding. Here
are a few guides that I found useful:

http://eric.themoritzfamily.com/2008/11/21/python-encodings-and-unicode/
http://evanjones.ca/python-utf8.html
http://wesc.livejournal.com/1743.html


More information about the Tutor mailing list