[DB-SIG] In praise of pyformat
mwm-keyword-dbsig.588a7d at mired.org
Sun Aug 12 19:07:17 CEST 2007
On Sun, 12 Aug 2007 02:30:45 +0200 Paul Boddie <paul at boddie.org.uk> wrote:
> On Sunday 12 August 2007 00:33, Mike Meyer wrote:
> > On Sat, 11 Aug 2007 17:10:34 -0400 Carsten Haese <carsten at uniqsys.com>
> > >
> > > The iron-clad, all-encompassing, golden rule is this: If something looks
> > > like a parameter marker and occurs in a place where a parameter marker
> > > is syntactically allowed, it's a parameter marker. Anything else isn't.
> > Ok, I'll accept that that's iron-clad and all-encompassing. It's also
> > circular. I suspect it also changes depending on the underlying
> > database.
> How is the above circular? If we have something, for example "?", which looks
> like a parameter marker (because we've defined "?" to be a parameter marker),
> and it appears in a place where such parameter markers are syntactically
> allowed/possible/recognised (see the SQL specifications available on the
> Internet), then our software notes that "?" in that place is a parameter
> marker. Otherwise, the occurrence of "?" just forms part of some other
> sequence of characters and is not considered to be such a parameter marker by
> our software.
How is "We only recognize parameter markers where we recognize
parameter markers" *not* circular?
> > > To execute a query where table names and column names are variable, you
> > > should use string formatting yourself to build the structure of the
> > > query, placing parameter markers into the structure where necessary, and
> > > then using parameter binding to supply the actual values.
> > But isn't one of the arguments against pyformat/format is that they
> > lead to people doing string formatting to build the query, which is a
> > bad thing because, unless done very carefully, they leave you exposed
> > to all kinds of data injection attacks? If the other styles wind up
> > *requiring* you to build the query with string formatting, how can
> > they possibly be considered superior?
> Because for a lot of applications, people don't need to build up queries from
> parts, or if they do, they can do it without mixing in user-supplied values.
Sorry, not buying that one. "Most people don't need foo" isn't an
argument as to why something that doesn't do foo might be consider
superior to something that does.
> Distinguishing parameter usage from string substitution reduces confusion for
> beginners who don't tend to do such advanced stuff, and it makes them more
> aware of the issues when they finally start doing so.
I think you just pinpointed the problem: parameter substitution in
dbapi is being advertised as the solution to a problem it's not really
adequate to solve. IIUC, it's restricted by the underlying SQL
implementation (and inherits portability problems from there as
well). That some underlying SQL implementations may not support it at
all further complicates things.
So, in the spirit of having the module authors instead of the users do
the work, how about taking this burden away from execute/executemany,
and providing a tool that is adequate to the job.
Here's a clean slate design proposal:
execute's parameters are for access to the underlying SQL engines
parameter mechanism. It takes a string and a dict or list (just as it
does now). The string is passed to the database unchanged. executemany
is related to execute the same way it is now. paramstyles is a list of
strings indicating what parameters styles the underlying database
supports. The currently recognizes styles are 'qmark', 'named' and
'numeric'. An empty list means the database doesn't support parameter
substitution, and is a perfectly valid value. The module author
doesn't have to provide a mechanism for doing this so people can build
statements from untrusted data because we also provide:
build_statement is a tool for safely creating SQL statements from
untrusted data. The signature is build_statement(basestring, *args,
**kwargs). Providing both *args and **kwargs is
undefined. build_statement is a substitution mechanism, but not a
simple string substitution. Instead, the substitution markers indicate
where we substitute SQl tokens in the statement. Since tokens have
types that can't be determined from the type of the value - in
particular, a delimited identifier for a column name vs. a string
literal in an expression - we have to have type information for the
marker. So we're going to use the familiar %-notation to provide
it. The possible type indicaters are:
%s - produce a string literal. Values will be coerced to strings.
%d,f,g - the usual numeric substitutions.
%i - produce an identifier. Values must be strings.
%t - produce a time literal from a datetime.datetime or None. None
means to have the database substitute the current time when the
statement is executed.
Mike Meyer <mwm at mired.org> http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.
More information about the DB-SIG