Encoding/decoding: Still don't get it :-/
Gilles Ganault
nospam at nospam.com
Mon Mar 16 05:46:05 EDT 2009
On Fri, 13 Mar 2009 14:24:52 +0100, Peter Otten <__peter__ at web.de>
wrote:
>It seems the database gives you the strings as unicode. When a unicode
>string is printed python tries to encode it using sys.stdout.encoding
>before writing it to stdout. As you run your script on the windows commmand
>line that encoding seems to be cp437. Unfortunately your database contains
>characters the cannot be expressed in that encoding.
Vielen Dank for the help :) I hadn't thought about the code page used
to display data in the DOS box in XP.
It turns out that the HTML page from which I was trying to extract
data using regexes was encoded in 8859-1 instead of UTF8, the SQLite
wrapper expects Unicode only, and it had a problem with some
characters.
For those interested, here's how I solved it, although there's likely
a smarter way to do it:
============
data = re_data.search(response)
if data:
name = data.group(1).strip()
address = data.group(2).strip()
#content="text/html; charset=iso-8859-1">
name = name.decode('iso8859-1')
address = address.decode('iso8859-1')
sql = 'BEGIN;'
sql = sql + 'UPDATE companies SET name=?,address=? WHERE id=?;'
sql = sql + "COMMIT"
try:
cursor.execute(sql, (name,address,id) )
except:
print "Failed UPDATING"
raise
else:
print "Pattern not found"
============
Thanks again.
More information about the Python-list
mailing list