[DB-SIG] Improved support for prepared SQL statements

M.-A. Lemburg mal at egenix.com
Wed Dec 17 14:31:02 CET 2014


On 09.12.2014 13:24, Daniele Varrazzo wrote:
> On Tue, Dec 9, 2014 at 9:40 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 07.12.2014 22:06, SF Markus Elfring wrote:
>>> Hello,
>>>
>>> An interface for parameterised SQL statements (working with
>>> placeholders) is provided by the execute() method from the Cursor class
>>> at the moment.
>>> https://docs.python.org/3/library/sqlite3.html#sqlite3.Cursor.execute
>>>
>>> I assume that the "SQL Statement Object" from the SQLite C interface is
>>> reused there already.
>>> http://sqlite.org/c3ref/stmt.html
>>>
>>> I imagine that it will be more efficient occasionally to offer also a
>>> base class like "prepared_statement" so that the parameter specification
>>> does not need to be parsed for every passed command.
>>> I suggest to improve corresponding preparation and compilation
>>> possibilities.
>>> https://bugs.python.org/issue22956
>>
>> Please see our previous discussion on this subject:
>>
>> https://mail.python.org/pipermail/db-sig/2014-March/006090.html
>>
>> This was the first draft result of that discussion:
>>
>> https://mail.python.org/pipermail/db-sig/2014-March/006096.html
>>
>> It's not yet been added to the DB-API as standard extension. Perhaps
>> we ought to continue that discussion.

Just for better understanding, here's the proposal copied inline:

"""
.prepare(operation)

Prepare a database operation (query or command) without executing
it, e.g. to check for syntax errors, determine the parameter count
or initialize the cursor for subsequent calls to the .execute*()
methods.

The prepared operation string is retained by the cursor to allow
executing the operation without having to prepare the operation again.
In order to benefit from this caching, the .execute*() methods must
be run with the same operation string that was passed to the .prepare()
method.

The call to .prepare() closes any pending result sets on the cursor.

The prepared operation is only available until the next call to
one of the .execute*() methods or another call to the .prepare()
method.

Return values are not defined.
"""

> Sorry for not keeping up with the discussion I'd started myself. I
> have a few observations about your draft:
> 
>     The prepared operation string is retained by the cursor to allow
>     executing the operation without having to prepare the operation again.
> 
> should the prepared statement be accessible by an attribute? Many the
> drivers would make so, so wouldn't be better define what should it be?
> ``.prepared``? (a string or None if none is prepared). It could be one
> of the "optional" ones in case other drivers wouldn't be able to
> retrieve it back.

mxODBC has for a long time used cursor.command to expose the
last executed or prepared statement, mostly to make it easy
to rerun an .execute*() method with the same command string
in order to have the driver reuse the cached access plan for
the already prepared statement.

Here's an example:

cursor.prepare('select * from mytable')
cursor.execute(cursor.command)

We could make this or your .prepared attribute an optional
extension.

>     In order to benefit from this caching, the .execute*() methods must
>     be run with the same operation string that was passed to the .prepare()
>     method.
> 
> Not much of a problem with this interface. However sometimes people
> sticks a long string literal as execute method, e.g.::
> 
>     cur.execute("""LONG SQL""", args)
>     for record in cur: ...
> 
> with this interface the sql should be stored in a variable and passed
> to both prepare and execute:
> 
>     SQL = """LONG SQL"""
>     cur.prepare(SQL)
>     cur.execute(SQL, args)
> 
> in case of prepared statements, wouldn't it be handy to be able to
> pass None as the first execute parameter, which would result in the
> execution of the prepared statement or a ProgrammingError if no
> statement has been prepared?

I'm not sure I like this kind of implicit definition of the SQL
command. With cursor.command or cursor.prepared it's clear
what will get executed.

Also note that passing in None could mask programming errors,
e.g. in case you pass in cursor.command, but no prepared command
is available (the attribute would then return None). You'd still
get an error, but not the obvious one of having None as SQL command
parameter being invalid.

>     cur.prepare("""LONG SQL""")
>     cur.execute(None, args)
>     for record in cur: ...
> 
> not a performance saving of course: just for sake of ergonomic. Of
> course having cur.prepared one could always refer to that with
> ``cur.execute(cur.prepared, args)``
> 
>     The prepared operation is only available until the next call to
>     one of the .execute*() methods or another call to the .prepare()
>     method.
> 
> This is a bit unclear to me: I believe you mean "until the next call
> to one of the .execute() methods with a different query", right?

Yes. Good point. How about:

"""
The prepared operation is only available until the next call to
one of the .execute*() methods or another call to the .prepare()
method using a different command string than the one used for
preparing the previous operation.
"""

> (otherwise it seems you can execute a prepared statement only
> once...). But does it mean that the well defined behaviour of the
> driver is to actively clear the prepared statement upon an .execute()
> with a different argument, or just that it is not defined whether the
> previous query is still prepared?

No, I only wanted to say that the driver will forgot the
prepared statement in case a new statement is used.

> ----
> 
> One issue I would like to have clarified (or clarified that it cannot
> be clarified...) is: who owns the prepared statement? My data points,
> which I don't want to overgeneralise, are:
> 
> - in Postgres, the prepared statements belong to the connection, hence
> any of its cursors may use a statement prepared only once.
> - I work on a large software system implemented in Erlang and based on
> Postgres. The Erlang driver we use doesn't have a connection/cursor
> distinction and manages a connection pool. We automatically prepare
> almost every query we send to the backend so further calls (which in a
> Python transposition would be executed by a different cursor but
> likely by the same connection) benefit from preparation.
> 
> So, one of the options I would like to provide to psycopg users is to
> leverage the possibility to share prepared statement across cursors:
> this may or may not imply automatic prepare upon execute, it would
> likely imply that preparing the same statements multiple time in the
> same connection would be no-op etc.
> 
> What I am wondering is: are Postgres features too much specific to
> drive the API design? E.g. are there around db/drivers pairs for which
> the prepared statement would belong to the cursor? Or is the ownership
> of the prepared statements to the connection so common that the API
> design may benefit of it, e.g. making prepared statements shared
> across cursors, allowing more than one prepared statement per
> connection etc.?

I think this is too specific to a particular driver/backend.

E.g. in ODBC the prepared statement is owned by the cursor
and there may only be one such statement per cursor at any
time.

It is possible to open a few cursors, prepare frequently
used statements on them and them make them accessible via
a cursor pool. This works well (I know since I've implemented
such a logic some 14-15 years ago in a large application).

> ----
> 
> If the prepare() extension to the API is released as proposed above by
> M-A, my psycopg implementation would be something along this line:
> 
> - implement the cursor.prepare() method, a cursor.prepared attribute;
> - implement the behaviour as defined by the DBAPI;
> - the prepared statements would actually be registered inside the
> connection, so a cursor.execute*() call would automatically use a
> statement if found prepared by any of the connection's cursors (but
> preparation must be still explicit);
> - call to execute with unprepared queries wouldn't affect the prepared ones;
> - as an option (e.g. by using a certain Connection subclass declared a
> DBAPI extension) every executed statement would be automatically
> prepared (there may be a cursor.execute(..., prepared=False) maybe to
> blacklist specific queries).
> 
> Would this implementation be compliant enough or is it a too stretched
> interpretation of the API?

This would work out, I guess. You just need to be careful with
having too many prepared cursors in the pool:

* the cursors require memory on the client side and often
  also on the server side

* cursors can keep open result sets and both block connections
  and significant resources on both client and server side

To address part of the latter problem, we added a cursor.flush()
method in mxODBC.

> Please understand that these points are not raised to drive the API
> where my own project wants: as my knowledge of other DBMSs is limited
> I'm trying to understand if some of the features IMO valuable I've
> found in other software systems can be implemented generally by other
> DBMS/driver pairs, or if I'd still be free to implement such features
> for the Postgres user benefit while maintaining the DBAPI spirit and
> guidelines.

Sure, that's the purpose of the DB-API - it is extensible by
design.

The DB-API standard extensions were added to address proliferation
of such additions. If possible, the database modules should try
to standardize on these extensions, which is why discussions such
as these are useful.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 17 2014)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2014-12-11: Released mxODBC Plone/Zope DA 2.2.0   http://egenix.com/go67

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the DB-SIG mailing list