Anoying unicode / str conversion problem

Benjamin Kaplan benjamin.kaplan at case.edu
Mon Jan 26 21:40:41 CET 2009


On Mon, Jan 26, 2009 at 3:21 PM, Hans Müller <heintest at web.de> wrote:

> Hi python experts,
>
> in the moment I'm struggling with an annoying problem in conjunction with
> mysql.
>
> I'm fetching rows from a database, which the mysql drive returns as a list
> of tuples.
>
> The default coding of the database is utf-8.
>
> Unfortunately in the database there are rows with different codings and
> there is a blob
> column.
>
> In the app. I search for double entries in the database with this code.
>
> hash = {}
> cursor.execute("select * from table")
> rows = cursor.fetchall()
> for row in rows:
>        key = "|".join([str(x) for x in row])           <- here the problem
> arises
>        if key in hash:
>                print "found double entry"
>
> This code works as expected with python 2.5.2
> With 2.5.1 it shows this error:
>
>
> key = "|".join(str(x) for x in row)
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u017e' in
> position 3: ordinal
> not in range(128)
>
> When I replace the str() call by unicode(), I get this error when a blob
> column is being
> processed:
>
> key = "|".join(unicode(x) for x in row)
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 119:
> ordinal not in
> range(128)
>
>
> Please help, how can I convert ANY column data to a string which is usable
> as a key to a
> dictionary. The purpose of using a dictionary is to find equal rows in some
> database
> tables. Perhaps using a md5 hash from the column data is also an idea ?


unicode takes an optional encoding argument. If you don't specify, it uses
ascii. Try using (untested):

key = u"|".join(unicode(x, encoding="utf-8") for x in row)


> Thanks a lot in advance,
>
> Hans.
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090126/21fe3ec4/attachment.html>


More information about the Python-list mailing list