[DB-SIG] pyformat Parameter Style

Thu May 15 16:26:01 EDT 2003

On Thursday, May 15, 2003, at 13:34 US/Pacific, Magnus Lyckå wrote:

> At 16:03 2003-05-14 -0700, Chris Cogdon wrote:
>> I (Magnus) wrote about %s:
>>> To anyone used to Python this looks like we are passing in 
>>> str(my_float)
>>> while we are in fact passing in repr(my_float).
>>
>> Actually, that's not true. Python's quoting rules are NOT the same as 
>> most DMBS's quoting rules. For example the following:
>>
>> cursor ( "select blah from data where thingy=" + repr(thingy_value) )
>>
>> would only work in some circumstances. It would NOT work if, say, 
>> thingy_value contains a '  (python would then use " as a string 
>> quoter, which means something very different to the DBMS than single 
>> quotes), or if thingy_value contains foreign characters (python 2.2 
>> uses hex escapes, while most DMBSes require octal escapes).
>
> And how often is there a ' in a float? :)
>
> This is my point. All the magic is in strings, and possibly
> in types like date, but there is *no* magic in the numeric
> types, so allowing numeric values to be passed in with %f or
> %i etc seems ok. Why disallow this? Practically, I'm suggesting
> (for pysqlite):

Actually, there IS magic in floats. You have rules for how exponents 
are represented, whether or not a leading 0 is required before the 
decimal point, whether or not a leading + is allowed, or only -, and so 
on. Again, the rules for a DMBS may not be the same as the rules for 
Python.

Often, yes, they're similar enough that we can just use 
'str(your_float)' and pass the output straight into the SQL query. My 
argument is that the only person qualified to make that statement is 
the interface writer. The application writer should not need to know.

> (One might consider using "type(value) in [long, int, float]"
> instead, since subclasses might do odd things on for instance
> __str__, but that's also a problem with the current code.)

Well, not necessarily. The interface knows the incoming type[1] and 
knows how to present it so that the backend is happy.

[1]: Taking a OO-related tangent here, even in the case of a subclassed 
'long' for example, the 'OO rules' state that derivitave classes must 
have all the properities of the base class, but are allowed to restrict 
the values at the base class level. For example, to take a stupid 
example, I may create a 'odd_long' class that is a derivitave of the 
'long' class were all values must be odd. It still remains true that a 
'odd_long' IS a long for all possible values of 'odd_long', and also 
always has a valid representation in 'long's space.

> What would be the problem with this? %s will still work, but with
> 12 decimals instead of 17 for floats. I doubt that is a problem,
> and %e or %.16e or %r will still work. If the application programmer
> is insecure about the result, he can simply try it out with a
> normal formatted print.

But, again, this is pushing the responsibility of 'object to string' 
conversion onto the application writer, which he's not supposed to 
worry about. That's what the interface is for.

> I think it would be very useful to get the parsed SQL statements
> printed in some kind of debug mode, since strings can look different
> than one might expect. That is a feature I'd like to see.

Most databases have a debugging mode that output the SQL statements 
that are being issued to the backend.

I agree that it would be neat to have a 'hook' in the DB interface to 
do this, in the cases where this is not supplied, or you want to keep 
it all in python anyway.

> How on earth would the interface be able to know whether we
> actually want to store 0.1 or 0.10000000000000001 when a
> Python float can't distinguish them?

Because, the interface writer is expected to know the capabilities of 
both the backend, AND of Python. That's why it's called an 'interface' 
:)

> For exact decimal numbers, we have a mismatch between Python
> and SQL, since Python is lacking such a datatype, and after
> some years of Python programming I feel that explicit is
> better than implicit. I.e. I prefer that I control this
> conversion rather than to let the interface do that. (The
> problem is still strings. I can accept that the interface
> escapes and quote strings, because the escape and quote stuff
> is not to be stored in the db, it's just a way to make the
> backend accept exactly what I try to store, but it's a bit
> annoying that things like %.5s will fail.)

The 'correct' solution in this case is to have a 'fixed_point' type 
with the capabilities you require.

>> I agree that there may need to be an exception, or a clarification 
>> made, in the case of floats. However, this is also true when doing 
>> straight-out python coding. It is 'risky' to code something like the 
>> following:
>>
>> f = read_float_value_from_somewhere()
>> if f = 0.2:
>>     do_something_important ()
>>
>> This is risky because 'f' cannot ever possibly be exactly 0.2
>
> That is a syntax error, but f == 0.2 will work since 0.2 can't
> be exactly 0.2 either! :) Try it! (And try "0.1 + 0.1 == 0.2".)

Sorry for the syntax error :)  Yes, 0.1+0.1 == 0.2 does what's 
expected, but in other cases, it doesn't work. Viz:

 >>> 0.2 + 0.4 == 0.6
0

> It more complex than this, and I won't bore you with my old
> "war stories", but my point is that the application programmer
> might get into situations where he needs to be able to have as
> much control as possible, and I've not heard a good reason so
> far to remove that control as some db interfaces completely
> needlessly does today.

In that case, it's perfectly possible to form the SQL query completely 
yourself, and then pass it to execute without using the % syntax or 
parameters.

> If you have a decimal(10,2) column X with 1.15 stored, and
> do "...WHERE X=%s", (1.15)) you will get
> ...WHERE X=1.1499999999999999 today. Are you sure that will
> match? If you did "...WHERE X=%.2f", (1.15)) and my patch
> was applied, you'd get ...WHERE X=1.15. That would certainly
> match. I imagine that if it is decimal(10,2) as I said, the
> float passed in will be coersed into a decimal(10,2) and it
> will work anyway, but I think I can show other cases where
> this *is* a problem if I have to. Probably rare, but why
> remove the simple tool to fix it when it happens?

Okay, that's a good example. The difference here is that the interface 
does not necessarily have access to the data types IN THE DATABASE 
which would be required to make the above example work as expected. I'd 
still like to see the other issues addressed, though (including the 
inconsistent use of %r).

Perhaps a 'more correct' solution is to extend, in a limited fashion, 
which % escapes are allowed and which will return 'behaviour 
undefined'. (I think %r falls squarely in the behaviour undefined 
category)

> It would be much more pythonic is it *was* true. :)
> It's a really ugly hack as long as it's not true.

Well, not necessarily. SQL is not python. Since we're trying to convert 
python concepts to non-python concepts, there's going to be a lot of 
give and take. The classic example is how strings are presented in both 
worlds. There are a lot of similarities, but they're NOT equivalent.

Additionally, to me, just because a syntax uses '%s' style escapes, it 
does NOT necessarily mean that I get to use all the other escapes too. 
The documentation, even as it stands, only predicates the use of '%s' 
and '%(keyname)s'.

> One of the really great things with python is that it
> almost always works just as expected. I can learn something
> in one context and apply it in another. Other languages I
> use are full of "you can't do this with that" and "in that
> situation you will have to do it like this instead". Perl
> and Visual Basic for instance is full of such rubbish.

Well, pyPgSQL does what I expect, too, because I read the docs :)

>> Most interfaces run through a quoting function before being applied 
>> to the string containing the '%s's
>
> Only for strings,

This is not true, and I re-cite pyPgSQL.

> and that is a temporary decoration needed
> to get exactly the value in the original string passed to the
> backend! Changing the string
>
> Can't do
>
> into
>
> 'Can''t do'
>
> is just a temporary measure to make certain that we will
> actually use
>
> Can't do
>
> and neither more, nor less, in the actual SQL operation. I don't
> think that's more odd than the fact that the string "can" will
> have the representation 'can', and "can't" will have the
> representation "can't" in Python. That *is* an interface issue,
> and a feature for the coder.

This isn't totally correct, either. Python's repr() will turn 'high bit 
escapes' into \xnn with the newer releases of python, and this breaks 
the SQL, which only allowed octal escapes.

Again, just because two escaping mechanisms LOOK similar, it doesn't 
mean they ARE :)

> Yes it does. Read again. It only uses %s in the examples,
> but read the text. It seems M-A and I agree that it could
> be made clearer, but it *does* state "ANSI C printf format
> codes" and "Python extended format codes", and the tiny
> example is just that. Did you expect "name" being passed
> in as a flot or what? ;)

It depends on what is meant by 'python extended format codes'. Does it 
mean the whole gamut of %s %r %f, and so on, or does it JUST mean the 
capability to use the (keyname) extension to use by-name rather than 
by-position parameter passing. My initial interpretation, and that 
supported both by the pypg and pyPgSQL interfaces (not much experience 
with MySQL, sorry) is that the latter interpretation has been made.

>> And, again, '%r's actual MEANING is very poorly defined. repr() does 
>> not return a value that will always be accepted by the DMBS. It 
>> requires extra quoting, and then should be be quoting and THEN repr, 
>> or the other way around ?
>
> I don't qute understand the precious sentences. But the
> interfaces in question use repr() as default. See _quote
> above. That's not *my* idea.

And, perhaps they shouldn't, as repr() does not always return a value 
that is valid SQL syntax.

>> Most likely because the interface writer is just using the '%' 
>> operator to do the insertion of the parameters into the string, 
>> without 'stopping' the application writer from doing things like 
>> '%2.2s', and so on. There are many other instances where the 
>> behaviour outside of the specification is not well defined, but does 
>> not return an error.
>
> My point was mainly to point out that it's odd to block %f for
> no appearent good reason and allow people to write things like
> %.2s and thus trunctate something which is probably different
> than the applicaiton programmer intended.

They're not blocking %f just as they're not blocking %.2s. My argument 
is that the python % operator is being used because it's convenient, 
but returns inconsistent results of you step outside the spec's 
predication (my interpretation of it) to only use %s and %(keyname)s.

> I also think that several people have misunderstood the
> specification, even if it's explicit about allowing the
> ANSI C format codes if you read it properly.

The DB-API 2.0 spec makes NO mention of ANSI C format codes ... that's 
just an implication :)

-- 
    ("`-/")_.-'"``-._        Chris Cogdon <chris at cogdon.org>
     . . `; -._    )-;-,_`)
    (v_,)'  _  )`-.\  ``-'
   _.- _..-_/ / ((.'
((,.-'   ((,/   fL