Unicode / cx_Oracle problem

Diez B. Roggisch deets at nospam.web.de
Sun Sep 10 23:17:14 CEST 2006

> Value of the variable 'id' is  u'\ufeff'
> Value of the variable 'mean' is  u'('

So they both are unicode objects - as I presumed.

> It's very hard to figure out what to do on the basis of complexities
> on the order of
> http://download-east.oracle.com/docs/cd/B25329_01/doc/appdev.102/b25108/xedev_global.htm#sthref1042
> (tiny equivalent http://tinyurl.com/fnc54

Yes, that is somewhat intimidating.

> But I'm not even sure I got that far. My problems so far seem prior:
> in Python or Python's cx_Oracle driver. To be candid, I'm very tempted
> at this point to abandon the Python effort and revert to an all-ucs2
> environment, much as I dislike Java and C#'s complexities and the poor
> support available for all-Java databases.

That actually doesn't help you much I guess - just because JDBC will 
convert java's unicode strings to byte strings behind the curtains, you 
will lose all encoding information nonetheless - especially if the DB 
itself isn't running an encoding that will allow for all possible 
unicode characters to be represented.

>> Then you need to encode the unicode string before passing it - something 
>> like this:
>> mean = mean.encode("latin1")
> I don't see how the Chinese characters embedded in the English text
> will carry over if I do that.

Me neither, but how could I have foreseen that? So use something else 
instead - utf-8 for example, or whatever the oracle connection will grok.

I think you should read up on what unicode and encodings are, and how 
they work in python, and unfortunately how they do work in oracle. 
Because even if you use java - not understanding how things are 
connected will hit you in the neck at some point.


More information about the Python-list mailing list