[DB-SIG] Controlling return types for DB APIs

Carsten Haese carsten at uniqsys.com
Sat Apr 21 03:34:02 CEST 2007


Now that it's the weekend, I'd like to chime in. I had been thinking for
a while now about type conversions from the Informix angle. Since
Informix allows user-defined types, I'd like to implement a type
conversion scheme that is flexible enough to allow specifying type
conversions for any data type including UDTs.

Since I didn't consider myself creative enough to come up with a good
design by myself, I looked over the fence at other database
implementations, including PostgreSQL and sqlite, and to my utter
surprise, the one scheme that seems most Pythonic to me is the way JDBC
handles user-defined types.

The idea is that the programmer sets up type mappings. In Python, a type
mapping would just be a dictionary whose key is the database type (more
on that later) and whose value is a class object that derives from an
abstract SQLData class that would be defined by the API module. This
mapping would be stored on the connection as a default mapping, and the
connection's cursors will inherit shallow copies of this mapping.

A natural choice for the key in this mapping is the type indicator that
the API implementation already returns in cursor.description. The only
possible hangup would be if a DB-API implementation uses mutable objects
for these, but in my opinion that would be insane. All implementations
I'm aware of either use strings or integers for the SQL type indicator.

When a value is returned from the database, the computer checks if its
type is mapped. If yes, the constructor of the corresponding
SQLData-derived class is called with the value's "canonical" Python
representation as the only argument. The canonical representation is the
value that the API would return if no type map were in effect, which
would be the best, lossless Python equivalent of the data type in
question.

The SQLData-derived class may, of course, return an object of a
different type of object from its __new__ method (which would be useful
to map character data to unicode objects, for example), but in order to
allow seamless round-trips of data from the database to the application
and back to the database, the returned value should be directly usable
as an input parameter for the type of column that it came from.

For handling type conversions to the database, SQLData instances would
implement a ToDB method that would perform the reverse operation of the
constructor, i.e. to render the canonical Python representation of the
instance's contents, which can then be bound to input parameters in the
canonical way. 

This proposal does not address special per-column mappings, but I don't
think it needs to. In my experience it's rare that I'd want two columns
of the same type from the same query to be mapped to two different
Python types.

For handling exceptional circumstances, say e.g. you inherit a messed up
database that stores timestamps as nanoseconds since the big bang that
you automatically want to convert to a datetime object, I suggest
standardizing the concept of row factory functions. In a nutshell,
cursor objects would have an optional callable rowfactory attribute. If
a rowfactory is specified, it will translate between what a fetch would
normally return and what it should return instead.

Let me know what you think.

-Carsten




More information about the DB-SIG mailing list