[DB-SIG] Unicode interaction with the python DBAPI?
misa at redhat.com
Mon Jul 26 16:28:12 CEST 2004
Sooner or later it had to happen - I have to store data parsed from an XML
document into the database. The XML parsers produce Unicode strings for pretty
much everything (tag names, attributes, character data), which is, with the
little knowledge of XML that I have, correct.
I am using cx_Oracle btw, which is DBAPI 2.0 compliant.
What is the driver supposed to do when it receives Unicode data?
There are a couple of variables here. Oracle does the encoding conversion on
the fly for you, depending on your NLS_LANG environment variable. Looks like
Oracle does not allow you to change the session character set (via ALTER
SESSION) after the connection has been established. I am not sure how other
database backends handle this.
So, I suppose the driver should be aware of the session character set and try
to convert the unicode into the right encoding?
For instance, if my NLS_LANG is American_America.UTF8, the driver would encode
the unicode data into UTF8 and the database would store the data into whatever
character set the database is.
Does that sound like a plausible scenario? Should the driver expose all data
as Unicode instead of strings too? (that'd be part of fetching result sets)
I didn't seem to find anything Unicode related into the Database API document,
should there be something mentioned?
More information about the DB-SIG