Anoying unicode / str conversion problem

Mon Jan 26 15:21:01 EST 2009

Hi python experts,

in the moment I'm struggling with an annoying problem in conjunction with mysql.

I'm fetching rows from a database, which the mysql drive returns as a list of tuples.

The default coding of the database is utf-8.

Unfortunately in the database there are rows with different codings and there is a blob
column.

In the app. I search for double entries in the database with this code.

hash = {}
cursor.execute("select * from table")
rows = cursor.fetchall()
for row in rows:
	key = "|".join([str(x) for x in row])		<- here the problem arises
	if key in hash:
		print "found double entry"

This code works as expected with python 2.5.2
With 2.5.1 it shows this error:

key = "|".join(str(x) for x in row)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u017e' in position 3: ordinal
not in range(128)

When I replace the str() call by unicode(), I get this error when a blob column is being
processed:

key = "|".join(unicode(x) for x in row)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 119: ordinal not in
range(128)

Please help, how can I convert ANY column data to a string which is usable as a key to a
dictionary. The purpose of using a dictionary is to find equal rows in some database
tables. Perhaps using a md5 hash from the column data is also an idea ?

Thanks a lot in advance,

Hans.