[Tutor] Problems with encoding in BeautifulSoup
Kent Johnson
kent37 at tds.net
Tue Aug 18 13:59:26 CEST 2009
On Tue, Aug 18, 2009 at 12:18 AM, Mal Wanstall<m.wanstall at gmail.com> wrote:
> On Tue, Aug 18, 2009 at 9:00 AM, Eduardo Vieira<eduardo.susan at gmail.com> wrote:
>> Here is the Error output:
>> utf-8
>> Traceback (most recent call last):
>> File "C:\myscripts\encondingproblem.py", line 13, in <module>
>> print companies[:4]
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
>> position 373: ordinal not in range(128)
>
> It's caused by Python not wanting to send non-ASCII characters to your
> terminal. To override this you need to create a sitecustomize.py file
> in your /usr/lib/python/ folder and put the following in it:
>
> import sys
> sys.setdefaultencoding("utf-8")
>
> This will set the default encoding in Python to UTF8 and you should
> stop getting these parsing errors. I dealt with this recently when I
> was playing around with some international data.
Eduardo is on Windows so his terminal encoding is probably not utf-8.
More likely it is cp437.
Setting sys.setdefaultencoding() affects all scripts you run and will
make scripts that you write non-portable. A better solution is to
properly encode the output, for example
for company in companies[:4]: # assuming companies is a list
print company.encode('cp437')
Kent
More information about the Tutor
mailing list