[DB-SIG] In praise of pyformat

Sun Aug 12 00:33:27 CEST 2007

On Sat, 11 Aug 2007 17:10:34 -0400 Carsten Haese <carsten at uniqsys.com> wrote:

> On Sat, 2007-08-11 at 15:32 -0400, Mike Meyer wrote:
> > On Sat, 11 Aug 2007 18:14:16 +0200 Paul Boddie <paul at boddie.org.uk> wrote:
> > > On Saturday 11 August 2007 10:25, Mike Meyer wrote:
> > > >
> > > > Well, InformixDB has a way that works. I can think of at least two
> > > > other databases that it won't work for - or at least, will have
> > > > surprising consequences for users. That's not the same thing as the
> > > > paramstyle having a way.
> > > Can you name those database systems? The DB-API should support parameters 
> > > without additional quoting. If, say, PostgreSQL doesn't support parameters 
> > > internally, the module should prevent the programmer from knowing about that.
> > A number of open source db systems allow both single and double quotes
> > for literals, which breaks the "not inside apostrophes" rule that was
> > proposed. And from what's in the dbsig archives, informix uses ':' in
> > a couple of different places.
> Apparently I've confused you by simplifying the situation.

No, you haven't confused me - you've just simplified things. I'm
trying to get things *precise*, not simple. When you simplified, you
lost precision. 

> The iron-clad, all-encompassing, golden rule is this: If something looks
> like a parameter marker and occurs in a place where a parameter marker
> is syntactically allowed, it's a parameter marker. Anything else isn't.

Ok, I'll accept that that's iron-clad and all-encompassing. It's also
circular. I suspect it also changes depending on the underlying
database.

> And in case that's not clear, parameter markers are not allowed inside
> string literals or delimited column names.

In every dbapi 2 module for every available database? Or just most of
them?

> The basic rule above still holds true, because you can't have a
> parameter placeholder inside a datetime literal or in a table name. The
> parser just uses a heuristic because writing a full SQL parser would be
> insane.

Let's see if I've got this: the exact rule is that you only do binding
where a parameter is allowed. Figuring out where a parameter is
allowed requires a large enough portion of a full SQL parser that
writing one would be insane. So doing what I'm asking - providing a
precise rule - would be insane. I think the key word is "insane".

> > [...]
> > > And where a question mark (or other parameter marker) appears outside a string 
> > > literal, it's unambiguously interpreted as a parameter marker since there 
> > > should be no other interpretation of that character outside literals.
> > 
> > Except there are cases where that's not so. As previously mentioned,
> > informix apparently uses ':' for a couple of things. And consider this:
> > 
> > >>> c.execute("insert into 'FOO' ('ID') values ('hello')")
> > <sqlite3.Cursor object at 0x55ed40>
> > 
> > This query has three string literals. Let's start replacing them...
> 
> The fact that the query works with string literals for table names and
> column names is mildly surprising. That must be an Sqlite peculiarity.
> It's definitely not standard SQL.

sqlite allows you to use either single and double quotes for string
literals and identifier delimiters.

> Also, I object to your use of the word "substitute", since it implies
> that parameter binding is a string formatting exercise. In most database
> engines, parameters are not substituted into the query text. Instead,
> the parameter values are "bound" to the placeholders in the query and
> transmitted to the database separately from the query.

Fair enough - it is a lot closer to binding than string substitution.

> > So these literal strings aren't values? Even though in some cases I'm
> > forced to quote them (and the cases vary from db to db, and how you
> > quote them varies, and .....)?
> > 
> > Or maybe with cx_Oracle:
> > 
> > >>> c.execute("""select count(*) from log_metric where "METRIC_STRING" = 'pages swapped out'""")
> > [<cx_Oracle.NUMBER with value None>]
> > >>> c.fetchall()
> > [(189,)]
> > >>> c.execute("""select count(*) from log_metric where :colname = 'pages swapped out'""", dict(colname='METRIC_STRING'))
> > [<cx_Oracle.NUMBER with value None>]
> > >>> c.fetchall()
> > [(0,)]
> > 
> > Hmm. Wrong answer. Let's try a different tack:
> 
> Right answer, wrong question. In the first case, you're counting where
> the contents of the column METRIC_STRING equal 'pages swapped out'. In
> the second case, you're counting where the string 'METRIC_STRING' (bound
> as a parameter value) equals 'pages swapped out'. Different question,
> different answer.

The answer isn't what I wanted, and hence wrong. The bug is in my
code - it doesn't ask the question I wanted to ask.

> To execute a query where table names and column names are variable, you
> should use string formatting yourself to build the structure of the
> query, placing parameter markers into the structure where necessary, and
> then using parameter binding to supply the actual values.

But isn't one of the arguments against pyformat/format is that they
lead to people doing string formatting to build the query, which is a
bad thing because, unless done very carefully, they leave you exposed
to all kinds of data injection attacks? If the other styles wind up
*requiring* you to build the query with string formatting, how can
they possibly be considered superior?

     <mike
-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.