[DB-SIG] Extending DB-API

Fri Nov 17 00:36:26 CET 2006

M.-A. Lemburg wrote:
>> The major things I think we can standardize:
>>
>> * There's no common way to configure databases.  I'd like to see a 
>> single URI syntax that everyone can use.  This should be modestly 
>> extensible via query string parameters.
> 
> There has been some discussion about this. I believe that this
> is something a wrapper would have to implement since part of the
> URI is mostly going to be the name of the database module to use.
> 
> Also, I'm not sure whether a URI syntax would be of any benefit,
> since the parameters required by various backends are rather
> diverse.
> 
> ODBC uses something called a data source definition string
> which is a simple string format of "<key>=<value>;" pairs.
> 
> Only a few of the keys are standardized. Many depend on the
> backend being used and often vary between ODBC drivers.
> 
> I doubt that you could convert a URI into such a string in
> a sensible way. In the end, you'd probably have to use
> the query part of the URI to pass in all the parameters
> that are non-standard in the URI syntax.
> 
> This wouldn't really help anyone, since it's just another
> way of writing things, but doesn't make things easier for
> the user.

Having a consistent way to configure databases through a single string 
would be very helpful.  I'm not particularly set on one format.  I do 
think a string -- not Python data structures -- is the right way to do 
this configuration.

Right now the dbapi doesn't define any consistent signature for the 
connect() function (and in practice the signatures are not at all 
consistent with each other).  There's two parts to this -- first, given 
a string there needs to be a way to find the connection factory, then 
the connection factory needs to accept a string.  I'd like it if extra 
arguments could also be parsed out, so for instance logging could be 
indicated through the connection string (or other conveniences).  This 
just means that the string should be reasonably extended in some way, 
not that the connection factory has to handle any of these extended bits 
of information.

> This is different at the application level, since an
> application will typically only support a handful of
> backends. In this case a simple URI would suffice,
> since all the other details would be added to the
> connection parameters at lower level of the application,
> e.g. in the database abstraction layer.

I don't see how this matters; the supported backends changes over time, 
and consistency among applications where backends overlap is still good. 
  Application's can have their support extended by external libraries, 
if there is a reasonable way to do this.  Right now there usually isn't.

>> * Given a database connection there should be a well-documented strategy 
>> for discovering the type of the connection (including the server's 
>> version) and loading up code specific to that module and some interface. 
> 
> In mxODBC we have these connection attributes:
> 
> .dbms_name
> 	String identifying the database manager system.
> .dbms_version
> 	String identifying the database manager system version.
> 
> .driver_name
> 	String identifying the ODBC driver.
> .driver_version
> 	String identifying the ODBC driver version.
> 
> They have proven to be quite useful, esp. when it comes to
> coding against specific backends.

Yes, this would be useful.

>>   This allows compatibility code to be developed separately from the 
>> database connection modules, and separately from consumers like ORMs. 
>> This would also give a clear place to build database introspection 
>> tools, without tying that work to the release schedules or development 
>> process of the current database modules.  Realistically those existing 
>> modules are developed fairly slowly and conservatively, and require 
>> skill in things like writing C extensions.  Compatibility layers have 
>> none of these qualities.
>>
>> * Unified exceptions.  This can be done currently with monkeypatching 
>> the inheritance hierarchy, but they aren't unified to anything in 
>> particular that you can rely on.
> 
> We already have a standard way for this: all exceptions should be
> exposed on the connection object as attributes.
> 
> This makes writing polymorphic code easy.

Currently you have to know the connection in order to catch the 
exception.  You cannot catch an exception when the code the connection 
is not directly exposed to your code.  There's lots of good reasons you 
might want to catch an exception when you don't have a handle on the 
connection that might raise it.

>> * Figure out a strategy for parameter styles.  Maybe this just means a 
>> reasonable way to handle SQL in a more abstract way than as strings with 
>> markers (that way no update is required to the dbapi).  Or maybe 
>> something more sophisticated.  Or we could even be lazy and use 
>> %s/pyformat, which is the only marker that can be easily translated to 
>> other markers.
> 
> See past discussions: pyformat is probably the worst of all
> parameter styles.

Yes, it's annoying.  It's also the only format with decent 
implementations.  Everyone else thinks parsing SQL is easy.  And maybe 
it can be done, but of course it is not easy.

> Note that any change in this respect will break *a lot* of existing
> and working code.

If it's an abstraction on top of the current dbapi then it isn't a 
problem.  I have no desire to break existing code, and would rather 
avoid changing any existing methods defined through the dbapi.  If there 
must be backward compatibilities, we should just define new method names.

> Perhaps we ought to make the parameter style a connection
> parameter that's writeable and then agree on a very limited
> set of required styles - perhaps just the qmark and the
> numeric style since these are easy to implement and can be
> mapped to all other styles.

You have to know something about the underlying SQL to do this.  For 
instance:

   "UPDATE foo SET x = 'bob\'s your uncle?'"

On some databases this is valid, and some not, and it effects whether 
the ? is a marker or not.  With a bit of thought I could come up with 
SQL that would be valid on both kinds of databases, but with different 
parses.

>> * Maybe some work on database connection pooling strategies.  Maybe this 
>> can just be library code.  I think we need a little more data on the 
>> threading restrictions than dbapi gives (sqlite in particular doesn't 
>> fit into any of the current levels).
> 
> Connection pooling is something which higher level interfaces
> have to manage and provide, since database drivers can't
> possibly know which connections to pool and in what way to suit
> the application needs.
> 
> Note that simply placing connections into a dictionary
> is not good enough, since each connection keeps state and
> thus may not be reusable.

Yes; mostly there's just some more information we need to implemented 
this separately.  Specifically whether connections can move between 
threads; sqlite is picky about this, and dbapi seems vague about it.

Also, sqlite :memory: databases can't have more than one connection, so 
you just can't pool connections.  There's no way to detect this from the 
outside.

>>  From there I see some other useful database standards:
>>
>> * A standard transaction container.  The Zope transaction container is a 
>> reasonable and straight-forward implementation (though it needs to be 
>> better extracted from Zope).
> 
> I don't understand this one.
> 
> The industry standards for transaction management are
>  * X/Open DTP (XA)
>  * MS DTC
> 
> See http://docs.openlinksw.com/mt/xamt.html for a good
> overview of how XA works. The MS DTC is described here:
> http://msdn2.microsoft.com/en-US/library/ms191440.aspx
> 
> Both interfaces are C level interfaces.

Ugh.  Well, that's not really a Python standard.

Here's Zope's:
http://svn.zope.org/ZODB/trunk/src/transaction/

Probably the interface is the best description:
http://svn.zope.org/ZODB/trunk/src/transaction/interfaces.py?rev=70066&view=markup

I assume a bridge from that interface to either of the transaction 
managers you give would be possible; since those are C-level *some* 
bridge is inevitable anyway.

>> * A standard way to retrieve database configuration and connections. 
>> This way database library/framework code can be written in a reasonably 
>> abstract way without worrying about deployment concerns.
> 
> Not sure what you mean here.

Given some database code, a strategy to fetch the "current" database 
connection or connection factory, or something like that.  The 
particular motivation here is that given this and some of the other 
pieces, web frameworks could just support "databases" and wouldn't 
require any specific code related to any one database 
wrapper/library/ORM.  And the story would be relatively consistent 
across environments -- not just web frameworks, but potentially any 
database-consuming system.

More concretely, something like:

   conn = get_database_connection('myapp')

Where somewhere else you configure a specific database for 'myapp'.  The 
name 'myapp' could be used to connect to multiple databases at the same 
time, by using different names for potentially different database 
connections; maybe using hierarchical dotted names kind of like the 
logging module does for logger names.  This is a convention built on 
dbapi, not something dbapi would be involved in itself.

-- 
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org