[DB-SIG] API suggestion: expose 'quote' method

David Rushby davidrushby at yahoo.com
Thu Jun 5 13:00:29 EDT 2003


--- Chris Cogdon <chris at cogdon.org> wrote:
> 
> On Thursday, Jun 5, 2003, at 09:18 US/Pacific, M.-A. Lemburg wrote:
> 
> > Not quite: the driver/database prepares the statement, ie. it parses
> > the SQL, finds out which data types the bound parameters needs and
> > then passes this information on to the application.
> 
> I know that the postgresql and mysql drivers gleen the datatype 
> information only from the data being passed as parameters.

Yes, but that approach is overly simplistic.

There are cases where the same Python object received as a parameter value
could require dramatically different internal representations based on the
context to which it's being submitted.  Database arrays are an example. 
Witness the following description of the PgArray class from the current
version of pyPgSql:
"""
# Description:	A Python wrapper class for PostgreSQL array.
#		It is used so that the list type can be used as an arg-
#		ument to Connection.execute() without being treated as
#		a PostgreSQL array.
"""

PgArray is an ad hoc, UserList-like wrapper around a built-in list.  The sole
purpose of the class, as far as I can tell, is to allow the driver to
distinguish between:
a) an incoming list of distinct parameters bound for several fields
b) an incoming list bound for a database array field

In a database engine that provides parameter type information via the native
client API, such a hack would not be necessary.

As another example, what if an incoming string parameter contains unprintable
characters, such as the null character?  The database might be capable of
storing such a value if it were *bound* as a string, but not if it's
submitted as a *literal* in a SQL statement.

As another example, what about date and time localization?  Some database
engines have variable modes of localization for dates and times.  If the
localization mode can vary on a per-field or per-table basis rather than only
on a database-wide basis, then in order to generate a locale-correct string
representation of an incoming date parameter, the Python driver must parse
the SQL statement, determine the fields involved, and look up their metadata.
 Whereas, *bound* dates and times are typically submitted to the database
engine in a locale-independent manner, or the metadata structures provided by
the native client API include information about the expected representation,
so that client drivers can convert to it.

As another example, some databases have data types that cannot be represented
as literals--they must be stored and retrieved using specialized API calls. 
Database arrays and/or blobs are perhaps the most common cases.  Arguably,
the inability to represent all datatypes literally is a flaw in the
engine--one which PostgreSQL and MySQL do not (AFAIK) suffer from.  However,
there may be good reasons for this limitation, as with partial, stream-based
I/O of blobs, or I/O of specific slices of database arrays.

> I know that the postgresql and mysql drivers gleen the datatype 
> information only from the data being passed as parameters.
> Do any other drivers get this from other places, such as the sql or 
> from information back from the database?

As far as I know, most other drivers obtain type information for parameters
from the database engine via the native client API.  The driver submits the
parameterized SQL string to the engine; the engine parses it and returns both
a prepared binary representation of the SQL statement and a metadata
structure containing parameter information.

> >> Is that right?
> >> If so, that seems to be a particularly bad performance hit, almost 
> >> not worth the savings in sending binary data. (Unless i'm missing 
> >> someting, of course :)
>
> > Huh ? Where's the performance hit ? The more meta-data you have
> > on the client side the better and faster the whole process can
> > be made. That's also why drivers usually keep a cache of
> > schema information on the client side.
> 
> It was only a performance hit if the driver was waiting for a response 
> from the database before sending the data.

It's a one-time, up-front performance hit that pays off dramatically over the
course of numerous executions of the same parameterized statement.  As long
as the SQL string submitted to the Cursor.execute method remains the same as
the previously executed SQL string, the prepared statement and the metadata
can be reused without additional overhead--only the parameter values vary.  

For example, here's the relevant section of kinterbasdb's Cursor.execute
implementation (from _kicore_execution.c):
--------------------------------------
static int _prepare_statement_if_necessary(
    CursorObject *cursor, PyObject *sql
  )
{
[...]
PyObject *prev_sql = cursor->previous_sql;
if (prev_sql != NULL) {
  /* If the PyObject pointers point to the same memory location, the two
  ** objects are certainly equal.  If the pointers refer to different
  ** memory locations, the two object might still be equal if their
  ** contents match. */
  if (sql == prev_sql || PyObject_Compare(sql, prev_sql) == 0) {
    cursor->_state = CURSOR_STATE_OPEN;
    return 0;
  }
}
[... else, ask the engine to prepare a new statement]
}
--------------------------------------

The overhead is very low in repeated executions of the same SQL string--just
a pointer comparison, if the current SQL string and the previous one are the
same Python string object.

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com



More information about the DB-SIG mailing list