[Tutor] Error with incorrect encoding
Kent Johnson
kent37 at tds.net
Tue Apr 15 18:43:39 CEST 2008
Oleg Oltar wrote:
> I am trying to parse an html page. Have following error while doing that
>
>
> src = sel.get_html_source()
> links = re.findall(r'<a class="al4"[^<]*</a>', src)
> for link in links:
> print link
Presumably get_html_source() is returning unicode? So link is a unicode
string. To print, unicode must be encoded somehow. By default Python
will try to encode as ascii, which causes the failure you are seeing.
Try
print link.encode('xxx')
where 'xxx' is the value of sys.stdout.encoding, most likely either
'utf-8' or 'windows-1252' depending on your platform.
Kent
More information about the Tutor
mailing list