[DB-SIG] Controlling return types for DB APIs

Sat Apr 21 19:33:04 CEST 2007

On Apr 21, 2007, at 12:18 PM, Carsten Haese wrote:

>
> Thanks for taking the time to read the proposal. You are making some
> good points. If you don't want to read all my responses, skip to the
> summary at the bottom.
>

of course ill read fully ! :)

>
> In my opinion, making the SQLData class specific to the API module is
> necessary since the ToDB method for translating a particular object to
> the database may differ from database to database.

thats true, but there are also cases where it doesnt. the  
"PickleType" example is one where all we want to do is shuttle a  
binary stream to/from the pickle API. I also have an interest in  
creating code that is database portable....the connection objects  
dealt with are produced by a factory, where the actual DBAPI module  
is private to one particular implementation of that factory.

>
>> - because SQLData's state is the data itself, SQLData is not really a
>> "type" at all, its a value object which includes its own database
>> translation function.  That greatly limits what kinds of types
>> SQLData can be realistically used for, and in fact it can only be
>> used for datatypes that are explicitly aware that they are being
>> stored in a database - and only a specific DBAPI too.
>
> Isn't that the point of defining a bidirectional type mapping from/to
> the database?

the point of bidirectional type mapping is to define database  
persistence conversions for a particular application data type.  but  
decoupling the application data type from a particular persistence  
strategy allows the persistence mappings to vary independently of the  
data type and the rest of the application on which it depends.

>
>>     If the answer is that we'd just use typemaps
>> for those, then what would we use SQLData for ?  I can use a typemap
>> for my SpatialData objects just as easily, without my SpatialData
>> object being welded to a specific persistence scheme and specific  
>> DBAPI.
>
> Of course you could, but "welding" the object to a specific DB-API is
> what allows the object to be passed transparently as an input  
> parameter
> into queries against that specific database.
>
> A corollary of the principle of least surprise is that it should  
> always
> be possible to take the result of a select query and insert that  
> object
> into the column that it was read from. Inheriting from SQLData is what
> allows this seamless select-insert round-trip.

Well, in this case, SpatialData is related to its persistence  
implementation through subclassing.  but its just as easy for  
SpatialData to be related to its persistence implementation via  
assocation without the subclassing requirement.    SQLAlchemy  
provides full-round trip capability of any type you want and uses the  
same mechanism for all types, including at the column level, without  
any different treatment of "Python" types and user-defined types.     
On both sides of the trip, all thats required is a dictionary that  
maps TypeEngine subclasses (which purely define database translation  
strategies) to either bind param names/positions and/or result column  
names/positions.  mapping to DBAPI/python types is just one more way  
of doing that (maybe I should look into adding that dimension to SA's  
implementation...)

>
> Essentially, the proposed input translation could change to
>
> if hasattr(in_param, "ToDB"):
>     in_param = in_param.ToDB()
>

OK, duck typing is much better and more analgous to JDBC's usage of  
an interface.   this solves the module-importing issue, but not  
necessarily the "different db's might require different ToDB()  
implementations" problem - it still binds my application-level value  
objects to an assumption about their storage...and if my application  
suddenly had to support two different databases, or even to persist  
the same collections of objects in both of those DBs (there are  
definitely apps that do this), now my program design has to create  
copies of values to handle the discrepancy.   the same issue exists  
for an application value that is stored in multiple places within the  
same database, but in different ways; such as a Date type that is  
stored both in some legacy table with a particuilar string-format  
style of storage and some newer table with a decimal-based storage  
format (or a different string format).

a behind-the-scenes registry of converters mapped to my application's  
types solves the multiple-databases problem, and bind/column-mapped  
converters solve the multiple-tables problem.

the non-class-bound approach, using registered converters, looks like:

converter = cursor.type_mappings.get(type(in_param), None)
if converter is not None:
     in_param = converter.ToDB(in_param)

that removes all interface responsibilities from in_param's class.

However, I can see the value in the presence of ToDB() (and FromDB()  
classmethods perhaps) being useful from strictly a convenience point  
of view.  that is, in the common use case that the persistence of a  
particular kind of object has no complex requirements.  but im not  
sure if DBAPI itself should present both a generalized method as well  
as a "convenience/80% case" method (of which ToDB() is the latter).   
If I wanted a SQLData-like class in my own application, I could easy  
enough create a metaclass approach that automatically registers the  
object's type-conversion methods using the generic typing system.

>
>> - per-column typemaps:  here is a common use case.  I am receiving a
>> row which contains two BLOB coluimns.  one BLOB is image data
>> representing a JPEG image, one BLOB is a pickled instance of a Python
>> class.  I would like to register type converters so that the second
>> column is run through the pickle.loads() function but not the
>> first.   If we are registering various type-handling callables at the
>> cursor level, it should be easy enough to add an optional integer
>> parameter which will bind that type converter to only a specific
>> column position in a result set.  the use case is more obvious in the
>> bind parameter direction.
>
> Yes, I already suggested this (passing the column number to the  
> outbound
> adapter) as a possible extension. However, the use case is convincing
> enough that we should probably allow for a more convenient per-column
> mapping that allows dispatching the conversion to a different adapter
> callable altogether, rather than having to define one adapter that
> returns one thing or another depending on which column it's converter.
>
> To handle this, the cursor could grow a coltypemap attribute, which  
> is a
> mapping of typemaps, keyed on the column number or, maybe more
> conveniently, column name.

probably both.

>
> In summary, I am open to making the following revisions:
> * The SQLData class would become optional or be eliminated. Inbound  
> type
> conversions between Python objects and the database will be  
> performed by
> a well-defined ToDB method that the object may implement regardless of
> its inheritance tree. If an inbound Python object doesn't define a  
> ToDB
> method, it'll be mapped by the canonical mapping for the particular
> database.

yeah thats more or less what i was saying above.

> * The outbound conversion call will receive additional parameters,  
> such
> as the cursor.description tuple, that will allow the adapter to  
> make the
> resulting object stateful with respect to all of its database type
> properties.

its possible that cursor.description doesnt have all the information  
we need; such as, a string column that represents dates, and we need  
to decide what string format is represented in the column.

> * Add an optional coltypemap attribute to the cursor for defining a
> column-specific typemap.

yeah, just having various maps of typing information to me seems to  
represent the one method that is of general use for all cases.