Trouble Encoding

deelan ggg at zzz.it
Tue Jun 7 12:22:54 CEST 2005


fingermark at gmail.com wrote:
> I'm using feedparser to parse the following:
> 
> <div class="indent text">Adv: Termite Inspections! Jenny Moyer welcomes
> you to her HomeFinderResource.com TM A "MUST See &hellip;</div>
> 
> I'm receiveing the following error when i try to print the feedparser
> parsing of the above text:
> 
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
> position 86: ordinal not in range(256)
> 
> Why is this happening and where does the problem lie?

it seems that the unicode character 0x201c isn't part
of the latin-1 charset, see:

"LEFT DOUBLE QUOTATION MARK"
<http://www.fileformat.info/info/unicode/char/201c/index.htm>

try to encode the feedparser output to UTF-8 instead, or
use the "replace" option for the encode() method.

 >>> c = u'\u201c'
 >>> c
u'\u201c'
 >>> c.encode('utf-8')
'\xe2\x80\x9c'
 >>> print c.encode('utf-8')

ok, let's try replace

 >>> c.encode('latin-1', 'replace')
'?'

using "replace" will not throw an error, but it will replace
the offending characther with a question mark.

HTH.

-- 
deelan <http://www.deelan.com/>







More information about the Python-list mailing list