[DB-SIG] In praise of pyformat

Sat Aug 11 23:10:34 CEST 2007

On Sat, 2007-08-11 at 15:32 -0400, Mike Meyer wrote:
> On Sat, 11 Aug 2007 18:14:16 +0200 Paul Boddie <paul at boddie.org.uk> wrote:
> > On Saturday 11 August 2007 10:25, Mike Meyer wrote:
> > >
> > > Well, InformixDB has a way that works. I can think of at least two
> > > other databases that it won't work for - or at least, will have
> > > surprising consequences for users. That's not the same thing as the
> > > paramstyle having a way.
> > Can you name those database systems? The DB-API should support parameters 
> > without additional quoting. If, say, PostgreSQL doesn't support parameters 
> > internally, the module should prevent the programmer from knowing about that.
> 
> A number of open source db systems allow both single and double quotes
> for literals, which breaks the "not inside apostrophes" rule that was
> proposed. And from what's in the dbsig archives, informix uses ':' in
> a couple of different places.

Apparently I've confused you by simplifying the situation.

The iron-clad, all-encompassing, golden rule is this: If something looks
like a parameter marker and occurs in a place where a parameter marker
is syntactically allowed, it's a parameter marker. Anything else isn't.
And in case that's not clear, parameter markers are not allowed inside
string literals or delimited column names.

Informix does use colons as part of the syntax in two places:
1) Datetime literals e.g. DATETIME(12:34:56) HOUR TO SECOND
2) Remote table references of the form databasename:tablename.

To protect from false positives, the parser in the InformixDB API sees
colons that follow alphanumeric characters as literal parts of the
query, and not as parameter placeholders.

The basic rule above still holds true, because you can't have a
parameter placeholder inside a datetime literal or in a table name. The
parser just uses a heuristic because writing a full SQL parser would be
insane.

> [...]
> > And where a question mark (or other parameter marker) appears outside a string 
> > literal, it's unambiguously interpreted as a parameter marker since there 
> > should be no other interpretation of that character outside literals.
> 
> Except there are cases where that's not so. As previously mentioned,
> informix apparently uses ':' for a couple of things. And consider this:
> 
> >>> c.execute("insert into 'FOO' ('ID') values ('hello')")
> <sqlite3.Cursor object at 0x55ed40>
> 
> This query has three string literals. Let's start replacing them...

The fact that the query works with string literals for table names and
column names is mildly surprising. That must be an Sqlite peculiarity.
It's definitely not standard SQL.

> >>> c.execute("insert into 'FOO' ('ID') values (?)", ['hello'])
> <sqlite3.Cursor object at 0x55ed40>
> 
> ok so far.
>
> >>> c.execute("insert into 'FOO' (?) values (?)", ['ID', 'hello'])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> sqlite3.OperationalError: near "?": syntax error
> 
> Whoops.

Whoops indeed. A column name must be an identifier. Parameter markers
are not identifiers. See the golden rule above.

> >>> c.execute("insert into ? ('ID') values (?)", ['FOO', 'hello'])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> sqlite3.OperationalError: near "?": syntax error
> 
> And again it doesn't substitute.

Again, table names must be identifiers, yada yada.

Also, I object to your use of the word "substitute", since it implies
that parameter binding is a string formatting exercise. In most database
engines, parameters are not substituted into the query text. Instead,
the parameter values are "bound" to the placeholders in the query and
transmitted to the database separately from the query.

> So these literal strings aren't values? Even though in some cases I'm
> forced to quote them (and the cases vary from db to db, and how you
> quote them varies, and .....)?
> 
> Or maybe with cx_Oracle:
> 
> >>> c.execute("""select count(*) from log_metric where "METRIC_STRING" = 'pages swapped out'""")
> [<cx_Oracle.NUMBER with value None>]
> >>> c.fetchall()
> [(189,)]
> >>> c.execute("""select count(*) from log_metric where :colname = 'pages swapped out'""", dict(colname='METRIC_STRING'))
> [<cx_Oracle.NUMBER with value None>]
> >>> c.fetchall()
> [(0,)]
> 
> Hmm. Wrong answer. Let's try a different tack:

Right answer, wrong question. In the first case, you're counting where
the contents of the column METRIC_STRING equal 'pages swapped out'. In
the second case, you're counting where the string 'METRIC_STRING' (bound
as a parameter value) equals 'pages swapped out'. Different question,
different answer.

> >>> c.execute("""select count(*) from log_metric where ":colname" = 'pages swapped out'""", dict(colname='METRIC_STRING'))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> cx_Oracle.DatabaseError: ORA-01036: illegal variable name/number
> 
> Nope. And for completeness:

That's because ":colname" is not a parameter placeholder, it is a
delimited column name, referencing a non-existent column
called :colname. Additionally, you're supplying a parameter value that
has no corresponding placeholder, which cx_Oracle doesn't allow.

> >>> c.execute("""select count(*) from :tabname where :colname = 'pages swapped out'""", dict(colname='METRIC_STRING', tabname="LOG_METRIC"))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> cx_Oracle.DatabaseError: ORA-00903: invalid table name
> >>> c.execute("""select count(*) from :tabname where :colname = 'pages swapped out'""", dict(colname='METRIC_STRING', tabname='"LOG_METRIC"'))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> cx_Oracle.DatabaseError: ORA-00903: invalid table name
> >>> c.execute("""select count(*) from ":tabname" where :colname = 'pages swapped out'""", dict(colname='METRIC_STRING', tabname="LOG_METRIC"))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> cx_Oracle.DatabaseError: ORA-01036: illegal variable name/number
> 
> Uh... ugh.

And all of these examples are covered by my previous explanations.

> I'm looking at trying to use it in automatically generated SQL
> code. For example, I want to use an sql database as the "desktop" half
> of pda database. The pda database is designed to go well with the pda,
> so table/field names have different rules, and I have to deal with
> that. Or (the need that's got me looking into this) I'm logging errors
> from an application via a systems monitoring tool, and the table/field
> names have to work with the systems monitoring tool. So the SQL -
> including the table and column names in the definition - is getting
> built on the fly. I can see that if I were writing the queries by hand
> and knew all the table/column names in advance, none of this would
> matter. But I'm not, so it does.

Well, as you hopefully know by now, in real SQL databases you can't use
parameter binding to get a table name or column name into a query. The
explanation for this is relatively simple. One major reason for the
existence of dynamic parameters is the ability to run prepared queries:
Let the database parse and plan the query once, and then execute the
query many times over with different actual values. Since the table
names and column names affect the query plan, they must not be supplied
by parameters.

The fact that (py)format implementations of DB-API generally do allow
parameter markers for identifiers is an implementation artifact of using
string formatting to achieve poor-man's parameter binding.

To execute a query where table names and column names are variable, you
should use string formatting yourself to build the structure of the
query, placing parameter markers into the structure where necessary, and
then using parameter binding to supply the actual values.

For example:

def insert_row(cur, tablename, valuesdict):
  columns = valuesdict.keys()
  params = [':%s'%c for c in columns]
  query = "insert into %s(%s) values (%s)" % (
     tablename,
     ','.join(columns),
     ','.join(params))
  cur.execute(query, valuesdict)

Hope this helps,

-- 
Carsten Haese
http://informixdb.sourceforge.net