[DB-SIG] In praise of pyformat

Sat Aug 11 21:32:53 CEST 2007

On Sat, 11 Aug 2007 18:14:16 +0200 Paul Boddie <paul at boddie.org.uk> wrote:
> On Saturday 11 August 2007 10:25, Mike Meyer wrote:
> >
> > Well, InformixDB has a way that works. I can think of at least two
> > other databases that it won't work for - or at least, will have
> > surprising consequences for users. That's not the same thing as the
> > paramstyle having a way.
> Can you name those database systems? The DB-API should support parameters 
> without additional quoting. If, say, PostgreSQL doesn't support parameters 
> internally, the module should prevent the programmer from knowing about that.

A number of open source db systems allow both single and double quotes
for literals, which breaks the "not inside apostrophes" rule that was
proposed. And from what's in the dbsig archives, informix uses ':' in
a couple of different places.

> > I do agree that the module authors ought to have to deal with this
> > rather than the users. But letting each module author do it however
> > they feel is best only helps users who are writing for a specific
> > module; users trying to write code that's portable between modules
> > and/or databases are better served by a specification that applies to
> > all modules, even if it adds the burden of flagging significant
> > characters as literal.
> 
> As you may have seen, I wrote some code which attempts to mark parts of 
> queries as literal and non-literal text:
> 
> http://www.python.org/pypi/sqlliterals
> 
> This should help module developers know whether a parameter marker is genuine 
> or not.

I'm not all that worried about what module developers have to do - I'm
more worried about what I, as a user of the module, have to do. If I
need to understand your sqlliterals code to use parameter quoting to
guard against data insertion attacks, have I really gained anything?

> > If a new version of the PEP is going to require supporting a parameter
> > style - which I believe is a good thing - it should be one that's at
> > least as explicit as the most explicit of the current parameter
> > styles. Which means the PEP needs to lay out the rules for when the
> > parameters are recognized as such and when they aren't.
> You previously asked the following question: "How do the other paramstyles 
> deal with wanting to get their significant character into the query?" The 
> answer is that at the application level you never have the problem of 
> getting, for example, a question mark into the query: it's either inside a 
> string literal, meaning that it's protected from any interpretation as a 
> parameter marker (1), or it's supplied as part of a value which is passed to 
> the database system in association with a parameter (2).

Actually, I think you can get a bit tighter than that. They aren't
merely "part of a value", they are a token in the value. I.e., if you
define token as "characters valid in a name plus ? and :" then any
token that isn't either a single ? or .startswith(':') isn't a
parameter. I'm not sure how that works with the informix uses, though.

> And where a question mark (or other parameter marker) appears outside a string 
> literal, it's unambiguously interpreted as a parameter marker since there 
> should be no other interpretation of that character outside literals.

Except there are cases where that's not so. As previously mentioned,
informix apparently uses ':' for a couple of things. And consider this:

>>> c.execute("insert into 'FOO' ('ID') values ('hello')")
<sqlite3.Cursor object at 0x55ed40>

This query has three string literals. Let's start replacing them...

>>> c.execute("insert into 'FOO' ('ID') values (?)", ['hello'])
<sqlite3.Cursor object at 0x55ed40>

ok so far.

>>> c.execute("insert into 'FOO' (?) values (?)", ['ID', 'hello'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.OperationalError: near "?": syntax error

Whoops.

>>> c.execute("insert into ? ('ID') values (?)", ['FOO', 'hello'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.OperationalError: near "?": syntax error

And again it doesn't substitute.

So these literal strings aren't values? Even though in some cases I'm
forced to quote them (and the cases vary from db to db, and how you
quote them varies, and .....)?

Or maybe with cx_Oracle:

>>> c.execute("""select count(*) from log_metric where "METRIC_STRING" = 'pages swapped out'""")
[<cx_Oracle.NUMBER with value None>]
>>> c.fetchall()
[(189,)]
>>> c.execute("""select count(*) from log_metric where :colname = 'pages swapped out'""", dict(colname='METRIC_STRING'))
[<cx_Oracle.NUMBER with value None>]
>>> c.fetchall()
[(0,)]

Hmm. Wrong answer. Let's try a different tack:

>>> c.execute("""select count(*) from log_metric where ":colname" = 'pages swapped out'""", dict(colname='METRIC_STRING'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
cx_Oracle.DatabaseError: ORA-01036: illegal variable name/number

Nope. And for completeness:

>>> c.execute("""select count(*) from :tabname where :colname = 'pages swapped out'""", dict(colname='METRIC_STRING', tabname="LOG_METRIC"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
cx_Oracle.DatabaseError: ORA-00903: invalid table name
>>> c.execute("""select count(*) from :tabname where :colname = 'pages swapped out'""", dict(colname='METRIC_STRING', tabname='"LOG_METRIC"'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
cx_Oracle.DatabaseError: ORA-00903: invalid table name
>>> c.execute("""select count(*) from ":tabname" where :colname = 'pages swapped out'""", dict(colname='METRIC_STRING', tabname="LOG_METRIC"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
cx_Oracle.DatabaseError: ORA-01036: illegal variable name/number

Uh... ugh.

> If one considers parameters as locations for string substitution, then I can 
> see how your question arises

I'm looking at trying to use it in automatically generated SQL
code. For example, I want to use an sql database as the "desktop" half
of pda database. The pda database is designed to go well with the pda,
so table/field names have different rules, and I have to deal with
that. Or (the need that's got me looking into this) I'm logging errors
from an application via a systems monitoring tool, and the table/field
names have to work with the systems monitoring tool. So the SQL -
including the table and column names in the definition - is getting
built on the fly. I can see that if I were writing the queries by hand
and knew all the table/column names in advance, none of this would
matter. But I'm not, so it does.

      <mike
-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.