Encoding/decoding: Still don't get it :-/

Gilles Ganault nospam at nospam.com
Mon Mar 16 05:46:05 EDT 2009


On Fri, 13 Mar 2009 14:24:52 +0100, Peter Otten <__peter__ at web.de>
wrote:
>It seems the database gives you the strings as unicode. When a unicode
>string is printed python tries to encode it using sys.stdout.encoding
>before writing it to stdout. As you run your script on the windows commmand
>line that encoding seems to be cp437. Unfortunately your database contains
>characters the cannot be expressed in that encoding.

Vielen Dank for the help :) I hadn't thought about the code page used
to display data in the DOS box in XP.

It turns out that the HTML page from which I  was trying to extract
data using regexes was encoded in 8859-1 instead of UTF8, the SQLite
wrapper expects Unicode only, and it had a problem with some
characters.

For those interested, here's how I solved it, although there's likely
a smarter way to do it:

============
data = re_data.search(response)
if data:
	name = data.group(1).strip()
	address = data.group(2).strip()

	#content="text/html; charset=iso-8859-1">
	name  = name.decode('iso8859-1')
	address = address.decode('iso8859-1')
	
	sql = 'BEGIN;'
	sql = sql + 'UPDATE companies SET name=?,address=? WHERE id=?;'
	sql = sql + "COMMIT"

	try:
		cursor.execute(sql, (name,address,id) )
	except:
		print "Failed UPDATING"
		raise
else:
	print "Pattern not found"
============

Thanks again.



More information about the Python-list mailing list