[2.5.1] ShiftJIS to Unicode?
Gilles Ganault
nospam at nospam.com
Thu Nov 27 06:46:21 EST 2008
On Thu, 27 Nov 2008 01:00:28 +0000, MRAB <google at mrabarnett.plus.com>
wrote:
>No problem here:
>
> >>> import urllib
> >>> data = urllib.urlopen("http://www.amazon.co.jp/").read()
> >>> decoded_data = data.decode("shift-jis")
> >>>
Thanks, but it seems like some pages contain ShiftJIS mixed with some
other code page, and Python complains when trying to display this. I
ended up not displaying the string, and just sending it directly to
the database:
========
title = None
m = firsttry.search(the_page)
if m:
try:
title = m.group(1).decode('shift-jis').strip()
except UnicodeEncodeError:
title = m.group(1).decode('iso8859-1').strip()
except:
title = ""
else:
m = secondtry.search(the_page)
if m:
try:
title = m.group(1).decode('shift-jis').strip()
except UnicodeEncodeError:
title = m.group(1).decode('iso8859-1').strip()
except:
title = ""
else:
print "Nothing found for ISBN %s" % isbn
if title:
#UnicodeEncodeError: 'charmap' codec can't encode characters in
position 49-55: character maps to <undefined>
#print "Found : %s" % title
print "Found stuff"
sql = 'INSERT INTO books (title) VALUES (?)'
cursor.execute(sql,(title,))
========
Thank you
More information about the Python-list
mailing list