[DB-SIG] pyformat Parameter Style

Thu May 15 20:13:27 EDT 2003

On Thursday, May 15, 2003, at 18:08 US/Pacific, Magnus Lyckå wrote:

> I'm afraid Chris and I might end up in kill files if
> we prolong this more... ;)

Whee! It would be a first! :)

> Can you give a real world example of this in existing
> databases? The SQL standard allows leading + or decimal
> point. At least Date/Darwen use both in ISBN 0201964260
> p 84 when they describe numeric literals.

No, I can't. However, I'm not willing to make any kind of assumption 
here.

The interface writer is in the best position to know if the backend 
requires special treatment. But... that doesn't stop them using the 
normal python % operator if he knows it'll accept anything that python 
will spit out, or making the appropriate adjustments if there are some 
exceptions.

>> Often, yes, they're similar enough that we can just use 
>> 'str(your_float)' and pass the output straight into the SQL query. My 
>> argument is that the only person qualified to make that statement is 
>> the interface writer. The application writer should not need to know.
>
> If I am to write an application for a certain RDBMS with the
> DB-API I have to know how this deviates from the SQL standard
> on issues like string concatenation, join syntax, what aggregate
> functions it has, what datatypes I can use, to what extent it
> supports things like sub-selects or updatable views etc, but I
> shouldn't be burdened with knowing how to write floats so that
> they are understood by the database? I don't buy that.

That would imply that ALL SQL coding decisions are best left to the 
application programmer, including how to quote and embed parameters. I 
don't think you're arguing that, though, since I think we both think 
the existing parameter substitution mechanism is just too damn 
convenient :)

The existing system (IMHO) is a compromise between getting the 
application programmer to do everything, and defining a 'SQL-neutral' 
language that the interface programmer then has to turn into 'real' SQL 
for the backend. Ie, the existing interface says "you need to know what 
kind of SQL will work for your circumstance, but I'll handle the 
parameter passing for you".

I STILL believe the better way to handle something like floats is to 
create a fixed_point type. However, I'm willing to concede that 
allowing extended formatting for the particular case of floats, to 
combat the precision issues of floating point, is a suitable compromise.

>> [1]: Taking a OO-related tangent here, even in the case of a 
>> subclassed 'long' for example, the 'OO rules' state that derivitave 
>> classes must have all the properities of the base class, but are 
>> allowed to restrict the values at the base class level. For example, 
>> to take a stupid example, I may create a 'odd_long' class that is a 
>> derivitave of the 'long' class were all values must be odd. It still 
>> remains true that a 'odd_long' IS a long for all possible values of 
>> 'odd_long', and also always has a valid representation in 'long's 
>> space.
>
> Python doesn't enforce that. AFAIK is no real consensus among
> the gurus on whether subclassing means restriction or extension
> or both.

There's a lot of things that python doesn't enforce :) But... you're 
right, this is off-topic.

>> But, again, this is pushing the responsibility of 'object to string' 
>> conversion onto the application writer, which he's not supposed to 
>> worry about. That's what the interface is for.
>
> I can see your point in theory, but do you really claim that
> this is a problem in the real world?

No, not YET. However, when you're designing architectures, you need to 
play with 'non real world' examples, or you get bit later on.

>> The 'correct' solution in this case is to have a 'fixed_point' type 
>> with the capabilities you require.
>
> Which we might have in Python 2.4... A little too far off
> in time for me... :)

Nothing stopping us writing our own types/classes. Many interfaces do 
this already for date/time types.

> Yes. The interface knows neither database types or application
> context, and it couldn't. It simply draws a conclusion on the
> combination of python type and backend engine, and I don't think
> that's always enough.

The interface DOES know about application context, based on what types 
it's being passed. If you want to do something special, then you create 
your own type like many DBIs currently do.

> But why not allow %r to work as usual? For me, one of the best
> things with python is that it's fairly orthogonal and transparent,
> and while it does make it easy to do the right thing and to
> understand what is happening, it rarely stops me from doing
> things that could be potentially harmful. I am allowed to take
> responsibility for my code.

But, if we're allowing exceptions for %s (ie, it quotes for you based 
on what the backend requires) then we're never going to achieve 
orthogonality, nor transparency. My argument is that the parameter 
passing mechanism was never INTENDED to be orthogonal to the python '%' 
operator, but it was just merely a convenient, and available mechanism 
to use.

>>>> Most interfaces run through a quoting function before being applied 
>>>> to the string containing the '%s's
>>>
>>> Only for strings,
>>
>> This is not true, and I re-cite pyPgSQL.
>
> What other datatypes than strings are quoted in pyPgSQL?

The _quote function does a lot more than just quote strings. In effect, 
it's really a 'convert python data type to what SQL expects' function, 
and it always returns a string, which is why, in pyPgSQL, the following 
fails:

cursor. execute ( "select * from stuff where thingy=%d", int_value )

because 'int_value' is converted to a string (and, just happens to use 
repr(), but, converting a string does NOT use repr()... not directly, 
anyway)

So... in effect... ALL data-types are _quoted, but not necessarily 
quoted :)

>> This isn't totally correct, either. Python's repr() will turn 'high 
>> bit escapes' into \xnn with the newer releases of python, and this 
>> breaks the SQL, which only allowed octal escapes.
>
> That broke ReportLab, I know. I was obviously not very clear.
> I tried to say that I see the database interfaces fiddling with
> "'%s'" % value.replace("'", "''") as something similar to what
> repr() does to strings in a python context. I.e. it's not about
> changing the content of the string, but rather about presenting
> in particular way in a context, so that it will be able to be
> understood by the receiver.

Well, yes, that's my point. :)

> In other words, I feel that with the change in _quote() that I
> suggested pysqlite would feel much more intuitive for people
> who are used to the python %-operator. If you have three
> variables i = 5, f = 1.2 and s = "Hello ma'm", you can just do
> cur.execute('INSERT INTO X VALUES (%i, %f, %s)', (i, f, s))

The problem is... once you say you can use %i, %f and %s (and I've 
already agreed that there are good reasons to allow the flexibility in 
%d), then people are going to assume that you can use %r (which I've 
already argued doesn't make ANY sense, since repr() does not return 
values that are acceptable SQL), and %x (many DMBSes will only accept 
decimal integers).

> Then we need to understand that as a consequence of this, doing
> things like %.4s won't work well due to our "padding". But when
> it comes to things like perhaps using %f for a field that might
> be an integer or None, or using %f for a very small value, we
> are on very familiar territory. It's just the same problems with
> print fmt % values. All the things we know from the %-operator
> will be useful here.

Well, there's another problem. In python %f will raise a TypeError if 
you try to pass it None :)

That, plus the extra quoting done with %s, are arguments against 
pretending that the parameter passing mechanism is orthogonal to 
python. It just isn't. And making it more so will create EXTRA 
confusion unless we change the documentation to state what is and isn't 
allowed.

As far as DBI-2.0 is concerned, we're a little stuck because of the 
vagueness. I think it would be unwise to make many of the DBI's out 
there 'incorrect' by saying "oh, we really meant for you to support 
%f".  For the moment, it might be wise to state that the only 
formatting code that is predicated by the spec is %s, but some 
databases will allow you to use %f as you might expect, as well as %i. 
The interpretation for %r I think is even more varied.

Adding in support for %f formatting (specifically to cover the case of 
fixed-point numbers) I think is best left for 2.1

>> They're not blocking %f just as they're not blocking %.2s. My 
>> argument is that the python % operator is being used because it's 
>> convenient, but returns inconsistent results of you step outside the 
>> spec's predication (my interpretation of it) to only use %s and 
>> %(keyname)s.
>
> They *are* blocking the *use* of %f by running all
> numbers though repr() for no good reason. ;)

IF you take a particular interpretation of the specification, that ALL 
parameters should use %s, then running it through repr is REQUIRED. 
(otherwise, %s will give you str() instead of repr(), and that will 
return non-accurate results in some circumstances, notably floats)

>> The DB-API 2.0 spec makes NO mention of ANSI C format codes ... 
>> that's just an implication :)
>
> The current version certainly does:

My bad. Momentary brain fart.

To summarise:

I agree that either the use of %f, or additional data types, be 
mandated to cope with the fixed_point problem.

I disagree that we should push for more orthogonality in the parameter 
passing, since %s does not work the same way as python we will never 
achieve that orthogonality.

I argue that we really need to define the documentation better, 
specifying a 'must support' behaviour that encompasses the existing 
interpretations that %s should be able to handle all data types, plus 
the option to allow other formatting options in a 'known' behaviour. 
The 2.0 spec be updated to state that the only option that has 
'guaranteed' support is %s, and then we work on whether we want to add 
in things like %f and %i for a later specification of the documentation.

-- 
    ("`-/")_.-'"``-._        Chris Cogdon <chris at cogdon.org>
     . . `; -._    )-;-,_`)
    (v_,)'  _  )`-.\  ``-'
   _.- _..-_/ / ((.'
((,.-'   ((,/   fL