[DB-SIG] API 3.0 limiting paramstyle to ['named', 'qmark'] is okay. ('format' is not desirable)

Michael Bayer mike_mp at zzzcomputing.com
Sun May 19 05:54:15 CEST 2013

On May 18, 2013, at 10:22 PM, Daniele Varrazzo <daniele.varrazzo at gmail.com> wrote:

> My impression from this long thread is that we are in front of
> bikeshedding in its purest form: I haven't read a rationale yet to
> justify the rewrite of *every query* written in Python against MySQL,
> PostgreSQL and god knows what other database server. The justification
> has been just a tad more than "we like the ?". The feature
> specification is "let's copy Java".

as well as ODBC.

> Well, I've looked into that too: JDBC specifies *only* qmark, not
> named.
> Named is only supported by *a different* object (a wrapper, I
> understand): JDBC does not define a grammar where both qmark and named
> can be used.

believe it or not I'd prefer to *just* have qmark rather than continue to support six different paramstyles, two of which have an awkward overlap with Python string formatting (read: "format" and "pyformat" present the *most* problems of all the formats, due to their easy overlap with Python's built in formatting).

> And unsurprisingly so: the grammar is ambiguous:
> cur.execute("select $$a ? and a :foo$$", args)
> This is a valid postgres query. If args is an object implementing
> Mapping and Sequence ABCs there's no way to disambiguate that. Now go
> on and tell me that we could double the ? even when the :foo are used:
> we are making up that syntax now, because JDBC hasn't event tried. Oh,
> and in the placeholder with unary "?" operator (valid in postgres)
> "???": is the operator prefix or suffix?

I'd suggest we spend a little time seeing how these two universally used formats handle this.   The ? and :format styles have been in use for decades; I'm sure we can find adequate precedents for how escaping should be done.  We've already failed to specify things unambiguously with percent signs, I can show you a good handful of SQLAlchemy tests that we now mark as "failing" only for MySQL-python and psycopg2, no other DBAPIs - since at some point both DBAPIs changed how when they decide to handle % signs and when they don't, and it was just too much of an edge case for me to bother reacting again (tests that specifically involve putting a % sign in a table or column name - who the heck would do that?   beats me, but we had users reporting it).

I also have had some very annoying bugs where SQL statements being placed into error messages found their way into log lines, which then caused logging to fail because Python's logging also implicitly expects %s/%(name)s symbols in the strings, the strings had the wrong number of symbols and the logging of errors failed.    I've also had confused users who were trying to format string with %d, %f, etc. expecting that the database driver would interpret the types correctly.

It's specifically this overlap with a very common formatting system in Python that causes issues.  The strings we send to our databases are specifically *not* being sent to a Python interpreter, and this is the deeper reason why I feel that %s/%(name)s is less appropriate than a database-dedicated format.

> My bottom line is that it is provable that a spec mandating
> qmark/named is broken,  and psycopg will never implement it: not only
> for its technical inferiority but above all because there is no real
> request for it,

There's a reason you've never had these requests - people that actually write software using direct DBAPI access in my experience have no idea that other DBAPIs do things differently (and typically, they're newcomers to Python and often application-level programming overall anyway).   Anyone who starts writing a new application in a *second* DBAPI, is shocked to see how this new DBAPI is nothing like the first DBAPI they used so much, realizes immediately how impossible it is to write cross-DBAPI code, starts using a system like SQLAlchemy, and you never hear from them.   I hear from them, especially those poor users that have "%" signs in their DDL definitions and trip over the "provable" inconsistencies present in the current % sign implementations.

A bigger issue here, that's interesting to consider, is that I think you agree that there should only be two paramstyles, one name-based and one positional - how come the needs of the cx_oracle, pyodbc, or SQLite DBAPIs, who don't consistently use format/pyformat and who also would be greatly concerned about backwards compatibility and the inconvenience of change, are less valid than those of psycopg2?    It seems like moving from 6 to 2 paramstyles means lots of DBAPIs need to change no matter what.    Surely psycopg2, who is by far the most on the ball with changes and staying up to date could do the best job of making this transition smooth ..... just a thought.

More information about the DB-SIG mailing list