[DB-SIG] Improved support for prepared SQL statements

Michael Bayer mike_mp at zzzcomputing.com
Mon Dec 22 06:59:13 CET 2014

M.-A. Lemburg <mal at egenix.com> wrote:

> On 21.12.2014 20:56, Michael Bayer wrote:
>> M.-A. Lemburg <mal at egenix.com> wrote:
>>> Whether or not a cursor is client or server side depends on many
>>> things and is generally database backend specific. MS SQL Server
>>> for example will default to server side cursors and revert to
>>> client side for some special cases:
>>> http://msdn.microsoft.com/en-us/library/ms131331.aspx
>> this is why I’m so troubled by the concept that a user of the DBAPI needs to
>> be aware of this backend-specific issue as a guide as to whether or not the
>> complicated and itself-expensive pooling of cursors on the application side
>> is necessary in order to gain performance (performance gains that need to be
>> greater than the performance lost by the complexity of pooling the cursors
>> behind some facade); not just at the per-database level, but at the use-case
>> level, as you state here with SQL Server’s driver choosing server- or
>> client- side cursors based on different conditions. I would much prefer that
>> DBAPIs handle the optimization of this underlying backend detail for us. It
>> is too low-level an issue to be exposed in a Python API as a normal matter
>> of course. 
> This is hardly something a Python DB-API module could manage or
> abstract away. As you can see in the above article, the database itself
> decides which variant to use. AFAIK, the driver does not provide a way to
> tell the database to use one or the other.

In practice, this aspect of the driver is considered by most to be an implementation detail that people usually don’t want to have to deal with except in optimization scenarios.   The example of SQL Server here is an outlier; most of the DBAPIs use only client- or server- side cursors up front and that’s it.   

JDBC exposes this concept via the Statement/PreparedStatement->ResultSet objects.  They don’t call any of these things a “cursor”, even though in practice you’d probably say  (I think you did say this earlier) that Statement->ResultSet or PreparedStatement->ResultSet is really the same as a “cursor” (so in that sense I will concede that since DBAPI has an explicit cursor already, that is obviously where the “prepared” statement has to start).   Though I like that JDBC changed the names and made roles more agnostic and detached from the underlying details, leaving it up to the driver as to how things will run behind the scenes.  Nobody is worried about re-using cursors or not with JDBC as they are not directly exposed, they are perhaps worried about re-using PreparedStatement objects, though in practice I think this is not typical.  The issue of whether things are invoked server or client side is not exposed very directly either, it is somewhat exposed though the setFetchSize() method.

What I liked about JDBC is that when using it, we aren’t nearly as often making these driver-specific decisions in application code all the time; JDBC’s exposure of PreparedStatement while not exposing “cursor” directly seemed to hit this “sweet spot” nicely.    The “cursor” in DBAPI seems a little awkward, in that as we talk about it, we continuously refer to the very specific cursor behaviors of ODBC, which is an old and very finely detailed low level API, as rationales for some of its odder corners, yet in pep-249 itself I see even a note that the entire “cursor” concept may have to be emulated for some backends that don’t have a real cursor.   

I’ve no doubt that DBAPI 3 is still going to have a cursor front-and-center.  But I’d at least like it if in a DBAPI 3 that these concepts of server and client side cursors as well as that of prepared statements can be discussed and exposed in much greater and consistent detail, from the perspective of which should be used by implementors and when, and how the presence of these behaviors and capabilities can be exposed in a consistent way to consuming applications.  As it stands, DBAPIs generally choose one or the other, rarely provide both, and hardly (with notable exceptions) make it at all clear which one it actually is, yet I’m tasked with having to know when the “cursor” resource is either practically free or significantly expensive, whether the object is reusable or whether it isn’t.   

More information about the DB-SIG mailing list