[DB-SIG] paramstyle specification (was: Two sample implementations of ['named', 'qmark'] auto switch (a report))

Daniele Varrazzo daniele.varrazzo at gmail.com
Wed May 22 02:04:44 CEST 2013


On Tue, May 21, 2013 at 7:30 PM, M.-A. Lemburg <mal at egenix.com> wrote:

> However, escaping doesn't strike me as one of the important
> details, since I have yet to find an example where you'd
> actually need escaping :-)

Uhm, I thought I'd shown a few of them. But they may have been
scattered around in the long thread and spinoffs. So, my spanners in
the machine are:

> First of all, SQL string literals should not be subject to
> marker parsing. They must be skipped and care must be taken
> to make sure that the '-quoting is being respected ('' -> ').

I've shown in a separate thread that postgres defines other string
literals than the 'quoted' ones. For instance the multiline ones,
using $$these$$ as delimiters, that may also be named and nested.
There are of course comments too to take care of. And "quoted"
identifiers. And on and on. Even the layman knowledge that quotes are
escaped in sql by doubling them is shaken by the postgres parser: the
string "a'b" can be expressed as E'a\'b'. In short, parsing postgres
syntax requires a well built tokenizer on the client, an excessive
requirement for a generic client-side library (especially one that is
not supposed to "understand" the query being sent but whose only role
is to pass it to the server).

> * qmark case: question marks don't appear outside SQL string
>   literals, so you don't need to escape them - they will
>   always refer to binding markers

In postgres ? is a valid operator, or part of valid operators. The
following query reports there are 30 such operators predefined in
postgres 9.1:

    =# select format('%s %s %s', oprleft::regtype, oprname, oprright::regtype)
    from pg_operator where oprname ~ E'\\?';

    abstime <?> tinterval
    path ?# path
    box ?# box
    ...

There are hstore operators
<http://www.postgresql.org/docs/9.1/static/hstore.html#AEN135244>,
geometric operators
<http://www.postgresql.org/docs/9.1/static/functions-geometry.html>,
and extension modules can -and do- add others
<http://pgmp.projects.pgfoundry.org/mpz.html#division-operators-and-functions>.

> * named case: it should be enough to make sure that the colon
>   of a named marker is not preceded by another colon
>   (negative look-behind). If it is, the potential substring
>   is not a named marker, e.g. "abc::integer < 3" is not a named
>   marker, "abc < :integer" is a named marker

If this works, I can offer a life jacket proposing to always use the :
as escape symbol, escaping ? as :? Don't know yet if it's valid: I'm
not practical with negative look-behind parsing. Definitely doesn't
look a familiar way to escape stuff.

> However, I think we can leave those deliberately underspecified
> in DB-API 3.0 to both encourage use of one of the mandatory
> paramstyles, and to allow database module authors to continue
> exposing natively used paramstyles.
>
> Something we do need to address in a paramstyle spec for
> qmark and named is SQL comments. Perhaps easiest would be
> to disallow them in SQL statements passed to .execute*().

It is definitely an "easy" route, but it leaves us with drivers that
by spec may forbid the use of certain database features: I feel it an
excessive price to pay in the name of consistency.

Again, sorry if I'm repetitive, but note that many of the problems
stem by a sort of implied feature request: that a driver can support
both qmark and named placeholders. I see how a driver supporting both
sequences and maps as parameters is handier than one only offering a
single one, but I think that precisely the qmark/named pair is a
dreadful combination to force into cohabitation, and this is the main
force leading to unnatural requirements such as the impossibility to
express some of the valid characters in the query. Even taking the
hard case of postgres: if qmark was the only chosen style, single
qmarks would become placeholders and double qmarks would become single
(as operators, in literals, in comments, everywhere, without the need
of a lexer). If instead only named was the chosen one people should
write casts as a::::int, but the system would be provable correct.
It's when both are allowed that escaping becomes an hairy thing.

If the dbapi3 requirement becomes to mandate support for a single
paramstyle things become manageable again.


-- Daniele


More information about the DB-SIG mailing list