handling unicode data

Filipe fcorreia at gmail.com
Fri Jun 30 18:14:28 CEST 2006


Marc 'BlackJack' Rintsch wrote:
> The `unicode()` call doesn't fail here but the ``print`` because printing
> unicode strings means they have to be encoded into a byte string again.
> And whatever encoding the target of the print (your console) uses, it
> does not contain the unicode character u'\xd8'.  From the traceback it
> seems your terminal uses `cp437` as encoding.
>
> As you can see here: http://www.wordiq.com/definition/CP437 there's no Ø
> in that character set.

somethings are much, much, clearer to me now. thanks!

For future reference, these links may also help:
http://www.jorendorff.com/articles/unicode/python.html
http://www.thescripts.com/forum/thread23314.html

I've changed my windows console copdepage to latin1 and the following
prints are now outputting "França", as expected:
print unicode("Fran\x87a", "cp850").encode("iso-8859-1")
print unicode("Fran\xe7a", "iso-8859-1").encode("iso-8859-1")

However, I don't yet fully understand what's happening with Pymssql.
The encoding I expected to be receiving from MSSqlServer was cp850 (the
column in question uses the collation SQL_Latin1_General_CP850_CS_AS),
but it doesn't seem to be what the query is returning. I tried
converting to a unicode string from a few different encodings, but none
of them seems to be the right one. For example, for cp850, using a
latin1 console:

--------------------------------------------------------
term = unicode(row[1], "cp850")
print repr(term)
print term

---- output -------------------------------------------
u'Fran\xcfa'
FranÏa
--------------------------------------------------------


And for iso-8859-1 (also got the same result for mbcs):
--------------------------------------------------------
term = unicode(row[1], "iso-8859-1")
print repr(term)
print term

---- output -------------------------------------------
u'Fran\xd8a'
FranØa
--------------------------------------------------------


What do you think? Might it be Pymssql doing something wrong?




More information about the Python-list mailing list