From skip at pobox.com Thu Mar 13 21:50:24 2014 From: skip at pobox.com (Skip Montanaro) Date: Thu, 13 Mar 2014 15:50:24 -0500 Subject: [DB-SIG] More tightly defining Cursor.callproc in PEP 249 Message-ID: Apologies in advance for this long mail. Also, I warn you that I've only subscribed to db-sig today, so I am blissfully unaware of any related discussions you might have had on this topic in the past. I skimmed the last couple months of message subjects and only saw PEP 249 referenced once (apparently incorrectly). At work, we use the python-sybase adapter in all our Python code: http://python-sybase.sourceforge.net/ We are in the midst of upgrading our code to use a newer version of our internal wrapper of that module from a version which relies on python-sybase 0.36 to a version which relies on python-sybase 0.40pre2. Yesterday, one of the other programmers reported that he could no longer retrieve stored procedure return values. In 0.38 and before, the return value was tacked onto the front of the first result set. In 0.39, an attempt was made to change how return values were processed, but that change introduced a new bug. I fixed that problem today. After this episode, I was motivated to revisit PEP 249's discussion of stored procedures. I barely use SQL in my own work, relying heavily on the work of others, and rarely formulating sophisticated queries myself. Until I had to fix this bug, I didn't even realize that stored procedures had return statements. I thought they transmitted values out using some other syntax. So, now I know that, besides being optional, and being widely different across different databases, stored procedures have three ways of returning results: 1. Zero or more result sets 2. Output (or in/out) parameters 3. Return values Processing result sets is well-defined. Call the cursor's fetch*() and nextset() methods repeatedly until nextset() returns False. I think the specification of other two ways of transmitting values out of a stored procedure could use some work though. The solution to return values in the python-sybase module is to set a status_result attribute on the Cursor object. Structurally, it looks like a one-row one-element result set, e.g.: [(1,)] if the stored procedure executed "return 1". This seems like a reasonable way to go about this, though I have no idea how complex return values can be, or how best to indicate that a stored procedure didn't return a value (None or an empty list both seem reasonable). Python-sybase allows you to specify output parameters in parameter dictionaries using the OUTPUT function. For example, adapted from some code I maintain: args = { "@date1": today, "@date2": Sybase.OUTPUT(date), "@symbol": "IBM", } conn = db.pool.get_connection() output = conn.callproc(conn.cursor, "previous_trading_day", args) In this case, the @date2 key represents an output parameter, and our wrapper rewrites that value. I don't know if we do this in our wrapper because the python-sybase authors intended not to do this rewriting, or if I'm working around an actual bug. The input side of in/out parameters is (I presume) passed in through the call to Sybase.OUTPUT(...). In a separate email thread, Marc-Andre suggested that we could add a callfunc() method to Cursor objects. I'm not entirely sure that's necessary, as (at least in the Sybase case) a stored procedure can return all three types of values in the same call. So, to draw this exceedingly long mail to a close, I propose: 1. PEP 249 should document how stored procedure return values are made available to the caller of Cursor.callproc. 2. PEP 249 should more precisely document how output and in/out parameters are specified and processed. Skip From daniele.varrazzo at gmail.com Mon Mar 24 17:53:53 2014 From: daniele.varrazzo at gmail.com (Daniele Varrazzo) Date: Mon, 24 Mar 2014 16:53:53 +0000 Subject: [DB-SIG] Prepared statements in python drivers Message-ID: Hello, lately there has been some interest in adding prepared statements support in Psycopg. It's a feature of which I see the usefulness but which I haven't used extensively enough to make my mind about be the best interface to present it to the driver clients. A toy prototype that hasn't lead to a great deal of discussion is described in this article: . This implementation is explicit: the cursor has a prepare() method and execute can be run without the statement, only with the parameters, which would call the prepared statements. Other implementations are possible of course. A cursor may prepare automatically a statement and then execute it, but this adds a network roundtrip, uses more server resources and can lead to suboptimal plans (because the planner doesn't know the parameter so can't decide about a filter selectivity etc.). A cursor may support a single prepared statement or many, in which case a cache invalidation policy could be needed etc. I was wondering a few things: - is there enough consensus - not only in the Python world - about how to implement a prepared statements interface on a db driver? - do other Python drivers implement stored procedures? Do they do it in a uniform way? - is the topic generic enough for the DB-SIG to suggest a DB-API interface or is it too database specific and the interface would be better left to the single driver? Thank you very much for any help provided. -- Daniele From mal at egenix.com Mon Mar 24 18:07:58 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 24 Mar 2014 18:07:58 +0100 Subject: [DB-SIG] Prepared statements in python drivers In-Reply-To: References: Message-ID: <5330666E.6090806@egenix.com> Hi Daniele, On 24.03.2014 17:53, Daniele Varrazzo wrote: > Hello, > > lately there has been some interest in adding prepared statements > support in Psycopg. It's a feature of which I see the usefulness but > which I haven't used extensively enough to make my mind about be the > best interface to present it to the driver clients. > > A toy prototype that hasn't lead to a great deal of discussion is > described in this article: > . > This implementation is explicit: the cursor has a prepare() method and > execute can be run without the statement, only with the parameters, > which would call the prepared statements. In mxODBC we use the following approach, which is based on the fact that cursor.execute*() methods may cache the command argument to enhance performance: cursor.prepare(command) prepare the command and set cursor.command to command cursor.command last executed/prepared command You then use this as follows, without having to change the DB-API .execute*() method signatures: cursor.prepare('select * from mytable where x = ?') cursor.execute(cursor.command, [1]) results = cursor.fetchall() > Other implementations are possible of course. A cursor may prepare > automatically a statement and then execute it, but this adds a network > roundtrip, uses more server resources and can lead to suboptimal plans > (because the planner doesn't know the parameter so can't decide about > a filter selectivity etc.). A cursor may support a single prepared > statement or many, in which case a cache invalidation policy could be > needed etc. > > I was wondering a few things: > > - is there enough consensus - not only in the Python world - about how > to implement a prepared statements interface on a db driver? It's a standard approach in the ODBC world, so should be wide-spread enough as concept. > - do other Python drivers implement stored procedures? Do they do it > in a uniform way? Hmm, what do stored procedures have to do with this ? > - is the topic generic enough for the DB-SIG to suggest a DB-API > interface or is it too database specific and the interface would be > better left to the single driver? We could add a standard extension for supporting a separate prepare step. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 24 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-03-29: PythonCamp 2014, Cologne, Germany ... 5 days to go 2014-04-09: PyCon 2014, Montreal, Canada ... 16 days to go 2014-04-29: Python Meeting Duesseldorf ... 36 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Mar 25 00:06:39 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 25 Mar 2014 00:06:39 +0100 Subject: [DB-SIG] More tightly defining Cursor.callproc in PEP 249 In-Reply-To: References: Message-ID: <5330BA7F.7050403@egenix.com> Hi Skip, sorry for the late reply. For some reason I did not see your email to the list in my inbox. On 13.03.2014 21:50, Skip Montanaro wrote: > [...] > > After this episode, I was motivated to revisit PEP 249's discussion of > stored procedures. I barely use SQL in my own work, relying heavily > on the work of others, and rarely formulating sophisticated queries > myself. Until I had to fix this bug, I didn't even realize that > stored procedures had return statements. I thought they transmitted > values out using some other syntax. The DB-API syntax goes like this: result_parameters = cursor.callproc(procname, parameters) with procname being the procedure name and parameters the sequence of parameters. The returned result_parameters is a copy of the parameters sequence with in/out and output parameters replaced with their new values. > So, now I know that, besides being optional, and being widely > different across different databases, stored procedures have three > ways of returning results: > > 1. Zero or more result sets > > 2. Output (or in/out) parameters > > 3. Return values That last bullet is usually often to stored functions, not procedures - even though there are databases which have procedures return value, just as there are databases which don't have procedures and instead call everything a function. Additionally, some databases don't allow in/out and output parameters for stored functions. > Processing result sets is well-defined. Call the cursor's fetch*() > and nextset() methods repeatedly until nextset() returns False. I > think the specification of other two ways of transmitting values out > of a stored procedure could use some work though. The solution to > return values in the python-sybase module is to set a status_result > attribute on the Cursor object. Structurally, it looks like a one-row > one-element result set, e.g.: [(1,)] if the stored procedure executed > "return 1". This seems like a reasonable way to go about this, though > I have no idea how complex return values can be, or how best to > indicate that a stored procedure didn't return a value (None or an > empty list both seem reasonable). > > Python-sybase allows you to specify output parameters in parameter > dictionaries using the OUTPUT function. For example, adapted from some > code I maintain: > > args = { > "@date1": today, > "@date2": Sybase.OUTPUT(date), > "@symbol": "IBM", > } > conn = db.pool.get_connection() > output = conn.callproc(conn.cursor, "previous_trading_day", args) Hmm, the method should be defined on the cursor, not the connection for DB-API 2.0 compatibility. I guess the module still uses the old and deprecated DB-API 1.0 approach. > In this case, the @date2 key represents an output parameter, and our > wrapper rewrites that value. I don't know if we do this in our > wrapper because the python-sybase authors intended not to do this > rewriting, or if I'm working around an actual bug. The input side of > in/out parameters is (I presume) passed in through the call to > Sybase.OUTPUT(...). > > In a separate email thread, Marc-Andre suggested that we could add a > callfunc() method to Cursor objects. I'm not entirely sure that's > necessary, as (at least in the Sybase case) a stored procedure can > return all three types of values in the same call. > > So, to draw this exceedingly long mail to a close, I propose: > > 1. PEP 249 should document how stored procedure return values are made > available to the caller of Cursor.callproc. Agreed, we need to address this in some way. Since the .callproc() signature is already defined to not return a procedure/function return value, my proposal was to introduce a new method .callfunc() which does support this. There are a few ways this could be done. The most intuitive is probably this one: Variant A: ---------- return_value = cursor.callfunc(funcname, parameters) Unlike the .callproc() method, this call would not support in/out or output parameters. It would still support creating result sets, though. A less intuitive alternative would be this one: Variant B: ---------- return_value_and_result_parameters = cursor.callfunc(funcname, parameters) with return_value_and_result_parameters being a sequence of the form [return_value, parameter0, parameter1, ...], i.e. the return_value is prepended to the parameters list. This would also support in/out and output parameters. More Pythonic: Variant C: ---------- (return_value, result_parameters) = cursor.callfunc(funcname, parameters) This would also support in/out and output parameters and allow to easily separate the return_value from the result_parameters. > 2. PEP 249 should more precisely document how output and in/out > parameters are specified and processed. Agreed as well :-) At the moment, the DB-API leaves these details to the database modules to figure out. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 24 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-03-29: PythonCamp 2014, Cologne, Germany ... 5 days to go 2014-04-09: PyCon 2014, Montreal, Canada ... 16 days to go 2014-04-29: Python Meeting Duesseldorf ... 36 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at python.org Wed Mar 26 01:22:35 2014 From: mal at python.org (M.-A. Lemburg) Date: Wed, 26 Mar 2014 01:22:35 +0100 Subject: [DB-SIG] cursor.callfunc() (was: More tightly defining Cursor.callproc in PEP 249) In-Reply-To: <5330BA7F.7050403@egenix.com> References: <5330BA7F.7050403@egenix.com> Message-ID: <53321DCB.80708@python.org> On 25.03.2014 00:06, M.-A. Lemburg wrote: >> [Skip] >> So, to draw this exceedingly long mail to a close, I propose: >> >> 1. PEP 249 should document how stored procedure return values are made >> available to the caller of Cursor.callproc. > > Agreed, we need to address this in some way. > > Since the .callproc() signature is already defined to not return > a procedure/function return value, my proposal was to introduce a > new method .callfunc() which does support this. > > There are a few ways this could be done. The most intuitive is > probably this one: > > Variant A: > ---------- > > return_value = cursor.callfunc(funcname, parameters) > > Unlike the .callproc() method, this call would not support > in/out or output parameters. It would still support creating > result sets, though. > > A less intuitive alternative would be this one: > > Variant B: > ---------- > > return_value_and_result_parameters = cursor.callfunc(funcname, parameters) > > with return_value_and_result_parameters being a sequence of the form > [return_value, parameter0, parameter1, ...], i.e. the return_value > is prepended to the parameters list. > > This would also support in/out and output parameters. > > More Pythonic: > > Variant C: > ---------- > > (return_value, result_parameters) = cursor.callfunc(funcname, parameters) > > This would also support in/out and output parameters and allow to > easily separate the return_value from the result_parameters. So which of those would you prefer ? Or perhaps someone has an alternative proposal which looks better ? >> 2. PEP 249 should more precisely document how output and in/out >> parameters are specified and processed. > > Agreed as well :-) > > At the moment, the DB-API leaves these details to the database > modules to figure out. -- Marc-Andre Lemburg Director Python Software Foundation http://www.python.org/psf/ From tlocke at tlocke.org.uk Wed Mar 26 20:38:29 2014 From: tlocke at tlocke.org.uk (Tony Locke) Date: Wed, 26 Mar 2014 19:38:29 +0000 Subject: [DB-SIG] Prepared statements in python drivers In-Reply-To: References: Message-ID: Hi Daniele, the latest release of pg8000 (1.9.7) has (experimental) support for prepared statements, but takes an implicit approach rather than an explicit one. In the connect() function there's a boolean use_cache parameter which tells pg8000 to cache prepared statements, keyed against the SQL query string. So when you do: cursor.execute(sql_query, params) it does a lookup on sql_query in a local cache, and executes the prepared statement if one is found, or creates and caches a new one if not. This implicit approach means that no extension to the DB-API is needed. To address some of your points: > A cursor may prepare automatically a statement and then execute it, but this adds a network roundtrip, uses more server resources and can lead to suboptimal plans (because the planner doesn't know the parameter so can't decide about a filter selectivity etc.). Internally, pg8000 uses prepared statements for everything, because it always uses the extended protocol rather than the simple protocol. This means the roundtrip is always done, so caching is always better from a roundtrip point of view in pg8000. Also, regarding your point about suboptimal server plans, I believe that this was true until PostgreSQL 9.3. In 9.3 the query plan may be changed on each execution of a prepared statement. > A cursor may support a single prepared > statement or many, in which case a cache invalidation policy could be > needed etc. With caching turned on, pg8000 stores the cache at the connection level. The cache is never invalidated, and prepared statements are never explicitly closed. When the connection is closed, postgres will close any prepared statements associated with that session. As I say, the caching of prepared statements is still at an experimental stage in pg8000, and any feedback is very welcome. Cheers, Tony. On 24 March 2014 16:53, Daniele Varrazzo wrote: > Hello, > > lately there has been some interest in adding prepared statements > support in Psycopg. It's a feature of which I see the usefulness but > which I haven't used extensively enough to make my mind about be the > best interface to present it to the driver clients. > > A toy prototype that hasn't lead to a great deal of discussion is > described in this article: > . > This implementation is explicit: the cursor has a prepare() method and > execute can be run without the statement, only with the parameters, > which would call the prepared statements. > > Other implementations are possible of course. A cursor may prepare > automatically a statement and then execute it, but this adds a network > roundtrip, uses more server resources and can lead to suboptimal plans > (because the planner doesn't know the parameter so can't decide about > a filter selectivity etc.). A cursor may support a single prepared > statement or many, in which case a cache invalidation policy could be > needed etc. > > I was wondering a few things: > > - is there enough consensus - not only in the Python world - about how > to implement a prepared statements interface on a db driver? > - do other Python drivers implement stored procedures? Do they do it > in a uniform way? > - is the topic generic enough for the DB-SIG to suggest a DB-API > interface or is it too database specific and the interface would be > better left to the single driver? > > Thank you very much for any help provided. > > -- Daniele > _______________________________________________ > DB-SIG maillist - DB-SIG at python.org > https://mail.python.org/mailman/listinfo/db-sig From mal at egenix.com Thu Mar 27 11:28:12 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 27 Mar 2014 11:28:12 +0100 Subject: [DB-SIG] Prepared statements in python drivers In-Reply-To: References: Message-ID: <5333FD3C.3050005@egenix.com> On 26.03.2014 20:38, Tony Locke wrote: > Hi Daniele, the latest release of pg8000 (1.9.7) has (experimental) > support for prepared statements, but takes an implicit approach rather > than an explicit one. In the connect() function there's a boolean > use_cache parameter which tells pg8000 to cache prepared statements, > keyed against the SQL query string. So when you do: > > cursor.execute(sql_query, params) > > it does a lookup on sql_query in a local cache, and executes the > prepared statement if one is found, or creates and caches a new one if > not. This implicit approach means that no extension to the DB-API is > needed. Hi Tony, thanks for the feedback. Please note that one of the main reasons for having a separate explicit API to prepare a statement is to run the query parser on the statement (and prepare the query plan on the server) without actually executing the statement. This can be used to e.g. setup a pool of cursors with already prepared statements for faster execution of commonly used queries. Your approach implements the standard caching mechanism that is already documented in the DB-API. It only extend this by caching not only the last statement, but a set of statements, if I understand correctly. The pool creation is also possible with this approach, but only after the cursors were used at last once. I don't think the use case is important enough to make .prepare() a requirement in the DB-API, but it would be great if we could come up with a standard extension definition. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 27 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-03-29: PythonCamp 2014, Cologne, Germany ... 2 days to go 2014-04-09: PyCon 2014, Montreal, Canada ... 13 days to go 2014-04-29: Python Meeting Duesseldorf ... 33 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Thu Mar 27 11:46:14 2014 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 27 Mar 2014 11:46:14 +0100 Subject: [DB-SIG] Prepared statements in python drivers In-Reply-To: <5333FD3C.3050005@egenix.com> References: <5333FD3C.3050005@egenix.com> Message-ID: <53340176.5070303@egenix.com> On 27.03.2014 11:28, M.-A. Lemburg wrote: > I don't think the use case is important enough to make .prepare() > a requirement in the DB-API, but it would be great if we could > come up with a standard extension definition. Here's a start: """ .prepare(operation) Prepare a database operation (query or command) without executing it, e.g. to check for syntax errors, determine the parameter count or initialize the cursor for subsequent calls to the .execute*() methods. The prepared operation string is retained by the cursor to allow executing the operation without having to prepare the operation again. In order to benefit from this caching, the .execute*() methods must be run with the same operation string that was passed to the .prepare() method. The call to .prepare() closes any pending result sets on the cursor. The prepared operation is only available until the next call to one of the .execute*() methods or another call to the .prepare() method. Return values are not defined. """ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Mar 27 2014) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2014-03-29: PythonCamp 2014, Cologne, Germany ... 2 days to go 2014-04-09: PyCon 2014, Montreal, Canada ... 13 days to go 2014-04-29: Python Meeting Duesseldorf ... 33 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From fog at dndg.it Thu Mar 27 11:52:27 2014 From: fog at dndg.it (Federico Di Gregorio) Date: Thu, 27 Mar 2014 11:52:27 +0100 Subject: [DB-SIG] Prepared statements in python drivers In-Reply-To: <53340176.5070303@egenix.com> References: <5333FD3C.3050005@egenix.com> <53340176.5070303@egenix.com> Message-ID: <533402EB.6080104@dndg.it> On 27/03/2014 11:46, M.-A. Lemburg wrote: > The prepared operation is only available until the next call to > one of the .execute*() methods or another call to the .prepare() > method. With this wording it seems that the prepared statement is invalidated by the next .execute() call, while you can call .execute() multiple times with the same (prepared) query string. federico -- Federico Di Gregorio federico.digregorio at dndg.it Di Nunzio & Di Gregorio srl http://dndg.it Do I know what a rhetorical question is? -- Homer Simpson