[DB-SIG] Improved support for prepared SQL statements
mike_mp at zzzcomputing.com
Fri Dec 19 00:12:36 CET 2014
M.-A. Lemburg <mal at egenix.com> wrote:
> On 18.12.2014 23:19, Michael Bayer wrote:
>>> On Dec 18, 2014, at 3:57 PM, Michael Bayer <mike_mp at zzzcomputing.com> wrote:
>>>> On Dec 18, 2014, at 3:39 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>> That would make the entire feature a non-starter for me. SQLAlchemy
>>>>> doesn’t hold cursors open beyond a single statement. My users would
>>>>> very much want a prepared-statement-per-transaction object.
>>>> Perhaps you ought to reconsider this approach. Creating and closing
>>>> cursors all the time does involve somewhat of an overhead.
>>> I will attempt to try this, though I am anticipating that DBAPIs are
>>> going to be problematic with this approach. One concrete example is the
>>> case where on psycopg2, we offer the option to use a “named” cursor,
>>> which on psycopg2 has the effect of maintaining the state of this cursor
>>> on the server side. However psycopg2 throws an error if such a cursor is
>>> used for anything other than a SELECT statement. So right there, we need
>>> more than one cursor based on the contents of the SQL. This is kind of a
>>> very specific situation though, I’ll see if the approach in general
>>> produces issues.
>> So I tried this, and pleasantly, there’s not *too* much side effect,
>> meaning a quick test against a few databases didn’t lead to many issues.
>> Where there were issues are in the tests relating to connection
>> invalidation within a 2pc context; I didn’t dig in to what the issues are
>> but its possible that the MySQL and psycopg2 DBAPIs have some more quirks
>> with cursors when 2pc is used (or my changes just were missing some edge
>> However, if I were to change this for real, it means that small bump in
>> stability now gets sent out to everyone, working on databases I don’t
>> even have regular access to such as sybase and DB2, and whatever quirks
>> of reusing cursors might exist that I’ve not been able to test; many
>> years of effort and user-feedback has gone into getting our Connection
>> class to be stable and predictable in an extremely wide variety of
>> situations (where we’re talking here about failure modes: disconnects,
>> deadlocks, timeouts, intercepting these conditions perfectly and getting
>> the system back into a stable state as efficiently and crash-free as
>> possible), and here we’re presented with the potential of overhead from
>> opening and closing many cursors, rather than keeping one around for…I
>> would presume the transaction scope.
>> This is exactly what I was getting at in my other email. We are
>> considering a significant change in a key area of stability in the name
>> of “reducing overhead”, so is it really worth it? For the drivers that
>> the vast majority of my users care about at least, the effect would
>> appear to be negligible, hitting barely a 1% difference with the pure
>> Python drivers that have much bigger performance problems just by being
>> in pure Python:
>> psycopg2 single cursor: 6.159881 (10000 executions)
>> psycopg2 multi cursor: 6.173749 (10000 executions)
>> pg8000 single cursor: 28.213494 (1000 executions)
>> pg8000 multi cursor: 28.620359 (1000 executions)
>> mysqldb single cursor (10000 executions): 11.702930
>> mysqldb multi cursor (10000 executions): 11.809935
>> mysql connector single cursor (1000 executions): 25.707400
>> mysql connector multi cursor (1000 executions): 26.096313
> The results are somewhat biased, since your test spends most of
> the time with fetching data from the database, not with
> the creation and deallocation of the cursor.
> You should try this with a table that only has e.g. 10-100 entries or
> a query which only yields a few result rows for comparison.
Good point, here is five rows fetched and the number of runs multiplied by
psycopg2 single cursor: 9.307149 (100000 executions)
psycopg2 multi cursor: 9.339134 (100000 executions)
pg8000 single cursor: 3.406491 (10000 executions)
pg8000 multi cursor: 3.391371 (10000 executions)
mysqldb single cursor (100000 executions): 8.512559
mysqldb multi cursor (100000 executions): 9.355145
mysql connector single cursor (10000 executions): 3.218654
mysql connector multi cursor (10000 executions): 4.089793
Differences are still negligible for Postgresql but for MySQL they are now
tangible, but less so for the C APIs which are already an order of magnitude
faster than the Python API. That is, the speed hit in MySQL-connector is
nothing compared to the slowness introduced by it being in pure Python.
>> psycopg2.ProgrammingError: can't call .execute() on named cursors more than once
>> Wow! So that’s definitely that :).
> I think there's a misunderstanding here. A named cursor refers to
> a database cursor pointing into a result set. Once that result
> set has been created and the cursor points to it, you cannot
> run another .execute() against it, unless you first close the
> named cursor - simply because it is already in use.
This is in fact why SQLAlchemy works the way it does in the first place;
cursors have always to me been something that you allocate for a single
statement’s result set. In the old ASP days, microsoft’s tools always handed
us a scrollable, updatable cursor, in fact. PG’s named behavior seems
natural to me and it is surprising to me that you’re suggesting that
cursor-per-statement is a poor practice.
More information about the DB-SIG