[DB-SIG] Improved support for prepared SQL statements

M.-A. Lemburg mal at egenix.com
Thu Dec 18 23:55:50 CET 2014


On 18.12.2014 23:19, Michael Bayer wrote:
> 
>> On Dec 18, 2014, at 3:57 PM, Michael Bayer <mike_mp at zzzcomputing.com> wrote:
>>
>>>
>>> On Dec 18, 2014, at 3:39 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>
>>>
>>>> That would make the entire feature a non-starter for me.    SQLAlchemy doesn’t hold cursors open beyond a single statement.    My users would very much want a prepared-statement-per-transaction object.
>>>
>>> Perhaps you ought to reconsider this approach. Creating and closing
>>> cursors all the time does involve somewhat of an overhead.
>>
>> I will attempt to try this, though I am anticipating that DBAPIs are going to be problematic with this approach.    One concrete example is the case where on psycopg2, we offer the option to use a “named” cursor, which on psycopg2 has the effect of maintaining the state of this cursor on the server side. However psycopg2 throws an error if such a cursor is used for anything other than a SELECT statement. So right there, we need more than one cursor based on the contents of the SQL.   This is kind of a very specific situation though, I’ll see if the approach in general produces issues.
> 
> So I tried this, and pleasantly, there’s not *too* much side effect, meaning a quick test against a few databases didn’t lead to many issues.   Where there were issues are in the tests relating to connection invalidation within a 2pc context; I didn’t dig in to what the issues are but its possible that the MySQL and psycopg2 DBAPIs have some more quirks with cursors when 2pc is used (or my changes just were missing some edge cases).
> 
> However, if I were to change this for real, it means that small bump in stability now gets sent out to everyone, working on databases I don’t even have regular access to such as sybase and DB2, and whatever quirks of reusing cursors might exist that I’ve not been able to test; many years of effort and user-feedback has gone into getting our Connection class to be stable and predictable in an extremely wide variety of situations (where we’re talking here about failure modes: disconnects, deadlocks, timeouts, intercepting these conditions perfectly and getting the system back into a stable state as efficiently and crash-free as possible), and here we’re presented with the potential of overhead from opening and closing many cursors, rather than keeping one around for…I would presume the transaction scope.   
> 
> This is exactly what I was getting at in my other email.  We are considering a significant change in a key area of stability in the name of “reducing overhead”, so is it really worth it?    For the drivers that the vast majority of my users care about at least, the effect would appear to be negligible, hitting barely a 1% difference with the pure Python drivers that have much bigger performance problems just by being in pure Python:
> 
> psycopg2 single cursor: 6.159881 (10000 executions)
> psycopg2 multi cursor: 6.173749 (10000 executions)
> 
> pg8000 single cursor: 28.213494 (1000 executions)
> pg8000 multi cursor: 28.620359 (1000 executions)
> 
> mysqldb single cursor (10000 executions): 11.702930
> mysqldb multi cursor (10000 executions): 11.809935
> 
> mysql connector single cursor (1000 executions): 25.707400
> mysql connector multi cursor (1000 executions): 26.096313

The results are somewhat biased, since your test spends most of
the time with fetching data from the database, not with
the creation and deallocation of the cursor.

You should try this with a table that only has e.g. 10-100 entries or
a query which only yields a few result rows for comparison.

In e.g. user interface applications using databases, most queries
only return 1-100 rows, and of those, the number of queries which
return just a single row are the most common cases.

For data processing applications, the situation is different
and you do work with large result sets. The cursor creation
overhead doesn't really matter much for those.

> I also had the idea that maybe the above cases don’t show much because these drivers aren’t using pure “server side” cursors in the first place; I know in ODBC, we have more of a formal “cursor” construct in the protocol and that is probably what you’re referring to (though I googled that just now, and per http://msdn.microsoft.com/en-us/library/ms130794.aspx it says explicit cursors are rarely used and ODBC automatically opens a cursor for individual result sets).   So I decided to try psycopg2 with a “named” cursor, and got this:
> 
> 	psycopg2.ProgrammingError: can't call .execute() on named cursors more than once
> 
> Wow!  So that’s definitely that :).

I think there's a misunderstanding here. A named cursor refers to
a database cursor pointing into a result set. Once that result
set has been created and the cursor points to it, you cannot
run another .execute() against it, unless you first close the
named cursor - simply because it is already in use.

> If my DB2 users upgrade to SQLAlchemy 1.0 and start experiencing less stable behavior with connections, they’d be fairly upset if I told them it was in the name of a 0.3% overhead improvement.    Kind of like in the case of prepared statements, if the server-side cursors can truly be safely recycled, this is highly dependent on the database and the DBAPI methodology in use, and again the DBAPI could offer this as a configurable feature not exposed on the outside (the way ODBC can offer transparent connection pooling, if you will).
> 
> 
> 
> 
> import random
> import timeit
> import psycopg2
> import pg8000
> import MySQLdb
> from mysql import connector as myconnpy
> 
> 
> def setup(conn):
>     cursor = conn.cursor()
>     cursor.execute("drop table if exists data")
>     cursor.execute(
>         "create table data (id integer primary key, data VARCHAR(100))")
>     cursor.executemany(
>         "insert into data (id, data) values (%s, %s)",
>         [
>             (counter + 1, "some value: %d" % random.randint(0, 10000),)
>             for counter in xrange(1000)
>         ]
>     )
>     cursor.close()
> 
> 
> def run_test_multi_cursor(conn):
>     cursor = conn.cursor()
>     cursor.execute("select * from data")
>     cursor.fetchall()
>     cursor.close()
> 
> 
> def run_test_single_cursor(conn, cursor):
>     cursor.execute("select * from data")
>     cursor.fetchall()
> 
> 
> def do_time(dbapi, single_cursor, style, number):
>     global conn
> 
>     if style == 'postgresql':
>         conn = dbapi.connect(
>             user='scott', password='tiger',
>             database='test', host='localhost')
>     elif style == 'mysql':
>         conn = dbapi.connect(
>             user='scott', passwd='tiger',
>             db='test', host='localhost')
> 
>     if single_cursor:
>         global cursor
>         cursor = conn.cursor()
>         time = timeit.timeit(
>             "run_test_single_cursor(conn, cursor)",
>             "from __main__ import run_test_single_cursor, setup, "
>             "conn, cursor; setup(conn)",
>             number=number
>         )
>         cursor.close()
>     else:
>         time = timeit.timeit(
>             "run_test_multi_cursor(conn)",
>             "from __main__ import run_test_multi_cursor, "
>             "setup, conn; setup(conn)",
>             number=number
>         )
> 
>     conn.close()
>     return time
> 
> psycopg2_cursor_time = do_time(psycopg2, True, 'postgresql', 10000)
> psycopg2_non_cursor_time = do_time(psycopg2, False, 'postgresql', 10000)
> pg8000_cursor_time = do_time(pg8000, True, 'postgresql', 1000)
> pg8000_non_cursor_time = do_time(pg8000, False, 'postgresql', 1000)
> mysqldb_cursor_time = do_time(MySQLdb, True, 'mysql', 10000)
> mysqldb_non_cursor_time = do_time(MySQLdb, False, 'mysql', 10000)
> mysqlconn_cursor_time = do_time(myconnpy, True, 'mysql', 1000)
> mysqlconn_non_cursor_time = do_time(myconnpy, False, 'mysql', 1000)
> 
> 
> print("psycopg2 single cursor: %f (10000 executions)" % psycopg2_cursor_time)
> print("psycopg2 multi cursor: %f (10000 executions)" % psycopg2_non_cursor_time)
> print("pg8000 single cursor: %f (1000 executions)" % pg8000_cursor_time)
> print("pg8000 multi cursor: %f (1000 executions)" % pg8000_non_cursor_time)
> print("mysqldb single cursor (10000 executions): %f" % mysqldb_cursor_time)
> print("mysqldb multi cursor (10000 executions): %f" % mysqldb_non_cursor_time)
> print("mysql connector single cursor (1000 executions): %f" % mysqlconn_cursor_time)
> print("mysql connector multi cursor (1000 executions): %f" % mysqlconn_non_cursor_time)
> 
> 
> 
>>
>>> Note that the use of cached prepared cursors for performance
>>> reasons is only one use of having the prepare step available
>>> on cursors. In practice, the more important one is to be able
>>> check SQL statements for errors without executing them and
>>> possibly causing a rollback on the transaction.
>>
>> Which kinds of errors are you referring to, if the statement has not been invoked, I would imagine this refers only to syntactical errors?  What kind of application contains SQL that may have syntactical errors that only become apparent at runtime and can’t be eliminated during testing?   
>>
>>
>>
>>
>>>
>>> -- 
>>> Marc-Andre Lemburg
>>> eGenix.com
>>>
>>> Professional Python Services directly from the Source  (#1, Dec 18 2014)
>>>>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>>>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>>> ________________________________________________________________________
>>> 2014-12-11: Released mxODBC Plone/Zope DA 2.2.0   http://egenix.com/go67
>>>
>>> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
>>>
>>>  eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>>   D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>>          Registered at Amtsgericht Duesseldorf: HRB 46611
>>>              http://www.egenix.com/company/contact/
>>
>> _______________________________________________
>> DB-SIG maillist  -  DB-SIG at python.org
>> https://mail.python.org/mailman/listinfo/db-sig

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 18 2014)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2014-12-11: Released mxODBC Plone/Zope DA 2.2.0   http://egenix.com/go67

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the DB-SIG mailing list