[DB-SIG] pyformat Parameter Style

Magnus Lyckå magnus at thinkware.se
Fri May 16 04:08:08 EDT 2003


I'm afraid Chris and I might end up in kill files if
we prolong this more... ;)

At 15:26 2003-05-15 -0700, Chris Cogdon wrote:
>Actually, there IS magic in floats. You have rules for how exponents are 
>represented, whether or not a leading 0 is required before the decimal 
>point, whether or not a leading + is allowed, or only -, and so on.

Can you give a real world example of this in existing
databases? The SQL standard allows leading + or decimal
point. At least Date/Darwen use both in ISBN 0201964260
p 84 when they describe numeric literals.

>Again, the rules for a DMBS may not be the same as the rules for Python.

That's not my point. The application coder must know the
SQL dialect he works with. If the format codes work like
a python coder would expect, with the only exception that
it allways escape and quote strings, he can predict what
will work for him on the given platform, and what won't.

>Often, yes, they're similar enough that we can just use 'str(your_float)' 
>and pass the output straight into the SQL query. My argument is that the 
>only person qualified to make that statement is the interface writer. The 
>application writer should not need to know.

If I am to write an application for a certain RDBMS with the
DB-API I have to know how this deviates from the SQL standard
on issues like string concatenation, join syntax, what aggregate
functions it has, what datatypes I can use, to what extent it
supports things like sub-selects or updatable views etc, but I
shouldn't be burdened with knowing how to write floats so that
they are understood by the database? I don't buy that.

>[1]: Taking a OO-related tangent here, even in the case of a subclassed 
>'long' for example, the 'OO rules' state that derivitave classes must have 
>all the properities of the base class, but are allowed to restrict the 
>values at the base class level. For example, to take a stupid example, I 
>may create a 'odd_long' class that is a derivitave of the 'long' class 
>were all values must be odd. It still remains true that a 'odd_long' IS a 
>long for all possible values of 'odd_long', and also always has a valid 
>representation in 'long's space.

Python doesn't enforce that. AFAIK is no real consensus among
the gurus on whether subclassing means restriction or extension
or both.

Some programming languages lets classes declare some methods
as "final" to prevent subclasses to override them, but Python
doesn't. But I think we're off topic now.

>But, again, this is pushing the responsibility of 'object to string' 
>conversion onto the application writer, which he's not supposed to worry 
>about. That's what the interface is for.

I can see your point in theory, but do you really claim that
this is a problem in the real world?

I've worked extensively with at least seven different RDBMS
brands in different versions in various project since 1990.
I've never ever run in to a problem in passing floats or other
numbers to any database. Not that I can remember at least. I
never heard of anyone else who had problems with that either.
I do remember a lot of other problems with databases though...

Strings and nulls, and possibly dates. This is where the problems
with datatypes lie in SQL, but I never heard of anyone failing
with numbers.

The float problem I've seen is "WHERE X = ?" with floats that
suddenly didn't work after a conversion from VAX/VMS to
PA-RISC/HP-UX... The old and new values didn't have the same
representation... But that is another issue.

>Most databases have a debugging mode that output the SQL statements that 
>are being issued to the backend.
>
>I agree that it would be neat to have a 'hook' in the DB interface to do 
>this, in the cases where this is not supplied, or you want to keep it all 
>in python anyway.

I think it would be nice to see what the SQL would look like
without actually trying to run it on the backend. There are
situations where you definitely want to inspect your SQL
before you run it, even if there is something called rollback...

>>How on earth would the interface be able to know whether we
>>actually want to store 0.1 or 0.10000000000000001 when a
>>Python float can't distinguish them?
>
>Because, the interface writer is expected to know the capabilities of both 
>the backend, AND of Python. That's why it's called an 'interface' :)

I don't think you understood me. If we store 0.1 or
0.10000000000000001 in a python float, they will be the same
from Python's point of view. But it's possible that the
backend could distinguish between them. The application
programmer can obviously know from context that in this case
it's really 0.1, and in that case it's really
0.10000000000000001, but there is no way the interface will
take these contexts into consideration. It's no big deal
really, you can just change cur.execute(sql, params) to
cur.execute(sql % params), but then we are on our own
regarding escaping/quoting. It seems confusing and silly
to be forced to alternate between these modes if we are
only passing in parameters in both cases.

>The 'correct' solution in this case is to have a 'fixed_point' type with 
>the capabilities you require.

Which we might have in Python 2.4... A little too far off
in time for me... :)

>In that case, it's perfectly possible to form the SQL query completely 
>yourself, and then pass it to execute without using the % syntax or parameters.

As I said, I want the escape/quote done for me. :)

>>If you have a decimal(10,2) column X with 1.15 stored, and
>>do "...WHERE X=%s", (1.15)) you will get
>>...WHERE X=1.1499999999999999 today. Are you sure that will
>>match? If you did "...WHERE X=%.2f", (1.15)) and my patch
>>was applied, you'd get ...WHERE X=1.15. That would certainly
>>match. I imagine that if it is decimal(10,2) as I said, the
>>float passed in will be coersed into a decimal(10,2) and it
>>will work anyway, but I think I can show other cases where
>>this *is* a problem if I have to. Probably rare, but why
>>remove the simple tool to fix it when it happens?
>
>Okay, that's a good example. The difference here is that the interface 
>does not necessarily have access to the data types IN THE DATABASE which 
>would be required to make the above example work as expected. I'd still 
>like to see the other issues addressed, though (including the inconsistent 
>use of %r).

Yes. The interface knows neither database types or application
context, and it couldn't. It simply draws a conclusion on the
combination of python type and backend engine, and I don't think
that's always enough.

>Perhaps a 'more correct' solution is to extend, in a limited fashion, 
>which % escapes are allowed and which will return 'behaviour undefined'. 
>(I think %r falls squarely in the behaviour undefined category)

I think it's up to the application programmer to know what
the result of a format code on a particular object will be,
whether he talks to a user, a text file for further data
preocessing or a database via SQL. The application programmer
is after all supposed to know both how the objects he work
with behave, and what SQL expects.

I'd like the default behaviour to be that format codes are
treated as with the %-operator, and that the exceptions are
documented. That would include escape/quote for string, proper
conversion for Date/Timestamp/Interval etc, and maybe for
Money classes etc. I guess Unicode needs special treatment as
well.

But why not allow %r to work as usual? For me, one of the best
things with python is that it's fairly orthogonal and transparent,
and while it does make it easy to do the right thing and to
understand what is happening, it rarely stops me from doing
things that could be potentially harmful. I am allowed to take
responsibility for my code.

>>>Most interfaces run through a quoting function before being applied to 
>>>the string containing the '%s's
>>
>>Only for strings,
>
>This is not true, and I re-cite pyPgSQL.

What other datatypes than strings are quoted in pyPgSQL?

>This isn't totally correct, either. Python's repr() will turn 'high bit 
>escapes' into \xnn with the newer releases of python, and this breaks the 
>SQL, which only allowed octal escapes.

That broke ReportLab, I know. I was obviously not very clear.
I tried to say that I see the database interfaces fiddling with
"'%s'" % value.replace("'", "''") as something similar to what
repr() does to strings in a python context. I.e. it's not about
changing the content of the string, but rather about presenting
in particular way in a context, so that it will be able to be
understood by the receiver.

In other words, I feel that with the change in _quote() that I
suggested pysqlite would feel much more intuitive for people
who are used to the python %-operator. If you have three
variables i = 5, f = 1.2 and s = "Hello ma'm", you can just do
cur.execute('INSERT INTO X VALUES (%i, %f, %s)', (i, f, s))

It's not so difficult to grasp the idea that the interface will
quote and escape strings for us, so we won't have to worry about
that. After all, it *is* the text

Hello ma'm

that we want to store in the database, not

'Hello ma''m'

The escaped and quoted thing is just padding during transport.

Then we need to understand that as a consequence of this, doing
things like %.4s won't work well due to our "padding". But when
it comes to things like perhaps using %f for a field that might
be an integer or None, or using %f for a very small value, we
are on very familiar territory. It's just the same problems with
print fmt % values. All the things we know from the %-operator
will be useful here.

>It depends on what is meant by 'python extended format codes'. Does it 
>mean the whole gamut of %s %r %f, and so on, or does it JUST mean the 
>capability to use the (keyname) extension to use by-name rather than 
>by-position parameter passing. My initial interpretation, and that 
>supported both by the pypg and pyPgSQL interfaces (not much experience 
>with MySQL, sorry) is that the latter interpretation has been made.

"codes" is plural. But I realize that there have been
different interpretations of this. That's what got me
started. I've heard people saying "only use %s" for
years, and never really thought so much about what the
spec said until M-A pointed out that all codes should
be working.

>And, perhaps they shouldn't, as repr() does not always return a value that 
>is valid SQL syntax.

But I don't think it's possible to create a much better
generic solution. I guess that sometimes the application
coder has to convert his objects beforehand.

pysqlite uses "hasattr(value, '_quote')", but I don't think
there is any spec support for that.

>They're not blocking %f just as they're not blocking %.2s. My argument is 
>that the python % operator is being used because it's convenient, but 
>returns inconsistent results of you step outside the spec's predication 
>(my interpretation of it) to only use %s and %(keyname)s.

They *are* blocking the *use* of %f by running all
numbers though repr() for no good reason. ;)

>>I also think that several people have misunderstood the
>>specification, even if it's explicit about allowing the
>>ANSI C format codes if you read it properly.
>
>The DB-API 2.0 spec makes NO mention of ANSI C format codes ... that's 
>just an implication :)

The current version certainly does:

http://www.python.org/peps/pep-0249.html :

Title: Python Database API Specification v2.0

...

         paramstyle

             String constant stating the type of parameter marker
             formatting expected by the interface. Possible values are
             [2]:
                 'qmark'         Question mark style,
                                 e.g. '...WHERE name=?'
                 'numeric'       Numeric, positional style,
                                 e.g. '...WHERE name=:1'
                 'named'         Named style,
                                 e.g. '...WHERE name=:name'
                 'format'        ANSI C printf format codes,
                                 e.g. '...WHERE name=%s'
                 'pyformat'      Python extended format codes,
                                 e.g. '...WHERE name=%(name)s'



--
Magnus Lycka (It's really Lyckå), magnus at thinkware.se
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The shortest path from thought to working program 




More information about the DB-SIG mailing list