Python 2.4 vs 2.5 - Unicode error

John Machin sjmachin at lexicon.net
Wed Jan 21 15:55:15 EST 2009


On Jan 22, 4:49 am, Gaurav Veda <vedagau... at gmail.com> wrote:
> Hi,
>
> I am trying to put some webpages into a mysql database using python
> (after some processing on the text). If I use Python 2.4.2, it works
> without a fuss. However, on Python 2.5, I get the following error:
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 4357: ordinal not in range(128)
>
> Before sending the (insert) query to the mysql server, I do the
> following which I think should've taken care of this problem:
>  sqlStr = sqlStr.replace('\\', '\\\\')
>
> (where sqlStr is the query).
>
> Any suggestions?

The 0xc2 strongly suggests that you are feeding the beast data encoded
in UTF-8 while giving it no reason to believe that it is in fact not
encoded in ASCII. Curiously the first errant byte is a long way (4KB)
into your data. Consider doing
    print repr(data)
to see what you've actually got there.

I'm a little skeptical about the "2.4 works, 2.5 doesn't" notion --
different versions of mysql, perhaps?

Show at the very least the full traceback that you get. Try to write a
short script that demonstrates the problem with 2.5 and no problem
with 2.4, so that (a) it is apparent what you are doing (b) the
problem can be reproduced if necessary by someone with access to
mysql.

You might like to explain why you think that doubling backslashes in
your SQL is a good idea, and amplify "some processing on the text".

HTH,
John



More information about the Python-list mailing list