[DB-SIG] pyformat Parameter Style

Magnus Lyckå magnus at thinkware.se
Thu May 15 01:24:51 EDT 2003


This is a detail, but I think it's misleading to support
%s but not %f etc. On the other hand, things happen with
the parameters passed to SQL statements, so it might be
misleading with %s etc either way. But it's the strings
that are changed. The numeric values are treaded just as
python programmers would expect.

Why not use ? and :name instead of %s and %(name)s if the
driver is to determine how the value is translated anyway.

Is it *only* to make it easier to write modules? Perhaps
someone could write an SQL "?" parser and donate that to
the DB-API community, and we can all use qmark instead?

There can't be so terribly many cases where ? occurs in
an SQL statement without being a parameter marker, can
there?

Chris Cogdon wrote:
>When one is passing data to the SQL backend, it should be up to the 
>BACKEND to determine how it is passed, and by that, I mean the python 
>interface to the backend. Take this example:
>
>Say I want to store a float into a row. The current way, which works, 
>would be to do this:
>
>cursor.execute ( "insert into data ( %s )", my_float )

To anyone used to Python this looks like we are passing in str(my_float)
while we are in fact passing in repr(my_float).

So if the backend is capable of storing exact decimal values and
we try to store 0.1, we will accidentally store 0.10000000000000001.
It's likely that the backend won't be configured to so many digits,
and will round it to 0.1 anyway, but I don't really like this.

It's not obvious as Python usually is. With
cursor.execute("insert into data ( ? )", my_float) or
cursor.execute("insert into data ( :1 )", my_float) I don't
have such a preconception. They do't give me any "cast to
string" message.

After all, it *must* be the application programmer who decides what value
to pass to the backend. If we have paramstyle = format, why not allow him
to express this as "... x=%.2f", my_float) instead of
"... x=%s", round(my_float, 2)) if that is what he wants?

The problem here is not with %f or %d, but with %s and string values!
See below.

>The interface sees the floating point value, turns that into the 
>'appropriate' representation that the SQL syntax accepts, and then passes 
>it to the DBMS. Now, it is 'expected' that the interface knows what the 
>'appropriate' format is, and does what's Right. Not only should the 
>application writer not know, and not be forced to worry about it, but he 
>shouldn't be ALLOWED to worry about it. For example:

I disagree. If a database application programmer is not aware of how
floating point numbers work in Python and in his backend on the given
platform, he is likely to cause a mess. I've cleaned up such things,
both in SQL code and in C++ code in various projects. There is no way
the driver can save anyone from doing stupid things like comparing
floats on equality without any fuzzfactor. The programmer *has* to
worry about float representations. This doesn't imply that he needs
full format capability in the SQL strings, but he can't let the
computer handle that in any programming language that uses floating
point arithmetic in the way C and Python does. (Not that it's fully
problem free with exact decimal representations either.)

And it seems to me that the purpose of using %s and only %s instead of
? or :1 which is the norm in SQL is only to make life easier for the
driver programmer, rather than to protect the applicaiton programmer.
This is *not* without a risk for the applicaiton programmer, as we will
see.

>cursor.execute ( "insert into data ( %2.2f )", my_float )

To anyone used to python it should be clear what this implies.
Do you mean that DB application programmers by accident enter 2.2
without knowing what it means? I find that unlikely.

I suppose it's possible that someone would type %f and not
realize that this is the same as %.6f though. That *is* a
risk. I can appreciate that.

But today, you must use %r to be able to get exactly the same
behaviour across database drivers, since the drivers that are
complient with the spec will use %s if you tell them to do
that, and others like pysqlite will use repr() whatever you
type after the %-sign.

By the way, I think all the drivers in question that stops our
ignorant application programmer from writing something potentially
harmful as ("insert into data ( %2.2f )", 1000./3.) will gladly
allow him to write something decidedly harmful such as
("insert into data ( %2.2s )", 1000./3.) and silently store 33
in the database instead of 333.33 which had been the case with
%2.2f. At least that is what pysqlite does. :(

Why on earth is that permitted, if the purpose is to protect
the application programmer from mistakes?

Obviously, it's not enough that the application programmer
understands how floats work. He needs to understand how float
works, how python format strings work, and the odd ways they
are used in various database drivers... Of course you could say
"Only use %s, don't put in any other things like %.2s, but if
this is the intention, why write the code so that the user can
make such a mess.

Perhaps these modules should throw an exception as soon as % is
followed by anything else than "s", "%" or "(word)s" ? That's
where the problems really are, since the drivers escape and
quote the strings, which makes them look different then they
do in the scope of the client application.

Also, the implementations of %s in for instance pysqlite chokes
on things like "select * from x where a=%s and b like 'hello%'"
You have to use 'hello%%', or you will get a
"TypeError: not enough arguments for format string"

In other words, it seems that a %s-limited format or %(xxx)s limited
pyformat isn't there to help the application programmer, since it has
dangers that ?, :1 or :xxx doesn't have, but is instead there just to
make life somewhat simpler for the driver programmer.

Here are other ways to do strange things with %s:

 >>> cur.execute('create table y ( a int, b char(50))')
 >>> cur.execute('insert into y values (%s, %s||%s)', (1, 'hi', 'there'))
 >>> cur.execute('insert into y values (%s, %.2s||%s)', (2, "'secret", 
"there"))
 >>> cur.execute('insert into y values (%s, %.1s||%.1s)', (3, "hi", "there"))
 >>> cur.execute('insert into y values (%s, %.4s)', (4, "hi''there"))
 >>> cur.execute('select * from y')
 >>> cur.fetchall()
[(1, 'hithere'), (2, 'there'), (3, '||'), (4, 'hi')]

I do think that cursor.execute("insert into data (%.2f)", my_float)
might be a reasonable thing to do, and to disallow that while
allowing the rubbish I did here, can hardly be for the sake of
the application programmer... :)



--
Magnus Lycka (It's really Lyckå), magnus at thinkware.se
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The shortest path from thought to working program 




More information about the DB-SIG mailing list