Serge.Orlov at gmail.com
Fri Jun 16 07:39:21 CEST 2006
William Xu wrote:
> Hi, all,
> This piece of code used to work well. i guess the error occurs after
> some upgrade.
> >>> import urllib
> >>> from BeautifulSoup import BeautifulSoup
> >>> url = 'http://www.google.com'
> >>> port = urllib.urlopen(url).read()
> >>> soup = BeautifulSoup()
> >>> soup.feed(port)
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "/usr/lib/python2.3/sgmllib.py", line 94, in feed
> self.rawdata = self.rawdata + data
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb8 in position 565: ordinal not in range(128)
> Any ideas to solve this?
According to the documentation
chapter "Beautiful Soup Gives You Unicode, Dammit" Beautiful Soup fully
supports unicode so it's probably a bug.
> version info:
> Python 2.3.5 (#2, Mar 7 2006, 12:43:17)
> [GCC 4.0.3 20060212 (prerelease) (Debian 4.0.2-9)] on linux2
> python-beautifulsoup: 3.0.1-1
Upgrading python-beautifulsoup is a good idea, since there were two bug
fix releases after 3.0.1
More information about the Python-list