[DB-SIG] Controlling return types for DB APIs

Sat Apr 21 18:18:01 CEST 2007

On Sat, 2007-04-21 at 10:52 -0400, Michael Bayer wrote:
> On Apr 21, 2007, at 3:09 AM, Carsten Haese wrote:
> 
> > Hi All,
> >
> > I have taken the time to write out my type mapping proposal in a
> > slightly more structured form, with some hopefully enlightening  
> > examples
> > of how this proposal might be useful.
> >
> > Please see http://www.uniqsys.com/~carsten/typemap.html
> >
> > Any comments are welcome, and I'll do my best to incorporate
> > constructive criticism into future revisions of this proposal.
> >
> 
> heres some thoughts:

Thanks for taking the time to read the proposal. You are making some
good points. If you don't want to read all my responses, skip to the
summary at the bottom.

> - using a class-level approach, i.e. SQLData, makes it inconvenient  
> to establish custom types that are independent of a particular DBAPI,  
> since the SQLData class itself like everything else in DBAPI only  
> exists within implementations.   its impossible to define the class  
> until you've imported some DBAPI.  SQLData's origins in JDBC dont  
> have this issue since SQLData is part of the abstract JDBC api and  
> classes can be built against it independently of any database driver  
> being available.

It's not impossible. You could always mix-in the somedb.SQLData base
class into a generic Python object after importing somedb. Also, since
the point is for an application to define type mappings for a particular
database, I don't see the limitation in making the SQLData class
specific to the API module that the application will usually have
imported already anyway.

In my opinion, making the SQLData class specific to the API module is
necessary since the ToDB method for translating a particular object to
the database may differ from database to database.

> - because SQLData's state is the data itself, SQLData is not really a  
> "type" at all, its a value object which includes its own database  
> translation function.  That greatly limits what kinds of types  
> SQLData can be realistically used for, and in fact it can only be  
> used for datatypes that are explicitly aware that they are being  
> stored in a database - and only a specific DBAPI too.

Isn't that the point of defining a bidirectional type mapping from/to
the database?

> For example, its impossible to use SQLData to directly represent  
> Decimal instances or datetime instances; neither of them subclass  
> <mydbapi>.SQLData.

True, but on one hand, datetime instances and Decimal instances should
be handled by the API's canonical mapping already, and on the other hand
the somedb API could always choose to return datetimes as objects that
derive from both datetime and somedb.SQLData.

>     If the answer is that we'd just use typemaps  
> for those, then what would we use SQLData for ?  I can use a typemap  
> for my SpatialData objects just as easily, without my SpatialData  
> object being welded to a specific persistence scheme and specific DBAPI.

Of course you could, but "welding" the object to a specific DB-API is
what allows the object to be passed transparently as an input parameter
into queries against that specific database.

A corollary of the principle of least surprise is that it should always
be possible to take the result of a select query and insert that object
into the column that it was read from. Inheriting from SQLData is what
allows this seamless select-insert round-trip.

Having said all that, I'm not married to the idea of requiring the
application-side objects to derive from a particular SQLData class. For
the purpose of input binding, it would be enough, and more in line with
the idea of duck-typing, if the object provided an agreed-upon method,
e.g. "ToDB" that the DB-API can call to translate between application
type and canonical database type.

Essentially, the proposed input translation could change to

if hasattr(in_param, "ToDB"):
    in_param = in_param.ToDB()

or something like that. It may be beneficial to allow this call to pass
more parameters in order to tell the object something about the context
in which the conversion is occurring, including but not limited to the
name of the API module, the active connection/cursor, and, if available,
the descriptor of the database-side column type and name that the object
is destined for.

> Also because SQLData is not stateful with regards to its type, its  
> not possible for a single SQLData class to represent variants of a  
> particular type, such as strings that should be truncated to length  
> 50 versus strings that are truncted to length 100; youd have to use  
> more subclassing.

As proposed, SQLData *is* stateful with regards to the type. By default,
it's not stateful with regards to subtype, length, and precision, but
this can, and should, be added. If the constructor is given the complete
cursor.description entry that goes along with the value, it has
everything it needs to remember this information.

> - per-column typemaps:  here is a common use case.  I am receiving a  
> row which contains two BLOB coluimns.  one BLOB is image data  
> representing a JPEG image, one BLOB is a pickled instance of a Python  
> class.  I would like to register type converters so that the second  
> column is run through the pickle.loads() function but not the  
> first.   If we are registering various type-handling callables at the  
> cursor level, it should be easy enough to add an optional integer  
> parameter which will bind that type converter to only a specific  
> column position in a result set.  the use case is more obvious in the  
> bind parameter direction.

Yes, I already suggested this (passing the column number to the outbound
adapter) as a possible extension. However, the use case is convincing
enough that we should probably allow for a more convenient per-column
mapping that allows dispatching the conversion to a different adapter
callable altogether, rather than having to define one adapter that
returns one thing or another depending on which column it's converter.

To handle this, the cursor could grow a coltypemap attribute, which is a
mapping of typemaps, keyed on the column number or, maybe more
conveniently, column name.

In summary, I am open to making the following revisions:
* The SQLData class would become optional or be eliminated. Inbound type
conversions between Python objects and the database will be performed by
a well-defined ToDB method that the object may implement regardless of
its inheritance tree. If an inbound Python object doesn't define a ToDB
method, it'll be mapped by the canonical mapping for the particular
database.
* The outbound conversion call will receive additional parameters, such
as the cursor.description tuple, that will allow the adapter to make the
resulting object stateful with respect to all of its database type
properties.
* Add an optional coltypemap attribute to the cursor for defining a
column-specific typemap.

Unless I'm missing something, these revisions should address all the
points you have brought up.

-Carsten