Python 2.4 vs 2.5 - Unicode error

Gaurav Veda vedagaurav at gmail.com
Wed Jan 21 23:50:16 CET 2009


> The 0xc2 strongly suggests that you are feeding the beast data encoded
> in UTF-8 while giving it no reason to believe that it is in fact not
> encoded in ASCII. Curiously the first errant byte is a long way (4KB)
> into your data. Consider doing
>     print repr(data)
> to see what you've actually got there.

>>> sqlStr[4352:4362]
' and 25\xc2\xb0F'

All I want to do is to just replace all the non-ascii characters by a
space.

> I'm a little skeptical about the "2.4 works, 2.5 doesn't" notion --
> different versions of mysql, perhaps?

I am trying to put content into the mysql server running on machine A,
from machine B & machine C with different versions of python. So I
don't think this is a mysql issue.

> Show at the very least the full traceback that you get. Try to write a
> short script that demonstrates the problem with 2.5 and no problem
> with 2.4, so that (a) it is apparent what you are doing (b) the
> problem can be reproduced if necessary by someone with access to
> mysql.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "putDataIntoDB.py", line 164, in <module>
    cursor.execute(sqlStr)
  File "/usr/lib64/python2.5/site-packages/MySQLdb/cursors.py", line
146, in execute
    query = query.encode(charset)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
4359: ordinal not in range(128)

> You might like to explain why you think that doubling backslashes in
> your SQL is a good idea, and amplify "some processing on the text".

I thought this will achieve 2 things.
a) It will escape any unicode character (obviously, I was wrong. Got
carried away by the display. I thought \xc2 will get escaped to \\xc2,
which is completely preposterous).
b) It will make sure that the escape sequences in the string (e.g.
'\n') are received by mysql as an escape sequence.

Thanks for your reply!
Gaurav

> HTH,
> John




More information about the Python-list mailing list