[DB-SIG] Remaining issues with DB API 2.0

M.-A. Lemburg mal@lemburg.com
Thu, 01 Apr 1999 11:31:33 +0200

Greg Stein wrote:
> > > On Wed, 31 Mar 1999, M.-A. Lemburg wrote:
> > >
> > > > To make testing for "multiple result sets supported" possible,
> > > > I think this API should be an optional method like .rollback:
> > > > either not be defined (giving an AttributeError) or raise
> > > > a NotSupportError... which brings us to the next point ;-)
> > > >
> > > >  Should we add NotSupportedError as subclass of OperationalError ?
> Nope. IMO, there are already too many exceptions, and some of them are
> rather unclear on how they are distinguished from one another.

Actually, those exceptions all originated from the odbc module
in win32all... weren't you involved in the writing of that module ?

You have got a point there about the fuzzyness of the definitions,
I won't except the point about them being too many: since they
are layered, it is really totally up to the user and the implementor
which exceptions to catch or raise and defining exception class is
no burdon on programmers (there are simple to use APIs available)
even if they choose not to use them.

This is the current class layout:


I would suggest extending the tree to allow Greg's proposed
DatabaseError to fit in:


This scheme can be extended by the programmer at will. I'll include
cut&paste code in the implementation hints section.

> > > >
> > > >  Is there a strong need for "capability testing" other than using
> > > >   hasattr() ?
> Nope.

Huh :-) Didn't you bring this up ??! Nevermind, I'm not too keen
on having it either, but I do like your clean and simple method
using hasattr().

> > >
> > > I have no objections to making it an optional method. As for other
> > > capability testing, I need to think that any program that relies upon some
> > > capabilty to function is simply going to break (hopefully in an obvious
> > > way) when it uses a database which doesn't have the capability in
> > > question.
> >
> > Ok. Greg ?
> I think it should be "optional void behavior, returning None". The lack
> of a nextset() will probably not utterly break an application like lack
> of a rollback. The thing here is that a nice behavior exists for nextset
> when it isn't available: simply return None (which also happens to mean
> "no more"). The semantics of its presence also match the semantics of
> its absence.
> I say leave the bugger in.

That makes 2 for "apply the same procedure as for .rollback()"
and 1 for "leave it as it is defined now".

Any others who would like to jump in ?

> > > This should not actually be part of the specification, but perhaps we need
> > > a design note, or guidelines for designing applications so that they can
> > > ported between different databases with a minimum of effort. This makes
> > > your application more resilient to changes in the DB API as well (though
> > > it's hardly affecting my application, if at all, and the API shouldn't
> > > change much more in the future).
> >
> > Would make a nice addendum to the spec... or maybe a separate
> > document. Volunteers ? Andy :-) ?
> IMO, very few applications need this portability. I'd definitely ask
> that it be a second document, published in the db-sig area.
> Review the apps that you have: how many were written to be truly
> portable? Is your SQL portable? How about those column types? etc. I
> know mine aren't, so I'm specifically anti-volunteering to do this :-)

Oh well, on one hand we are constantly arguing in favor
of database portability, on the other we simply state "not doable
anyways". This doesn't lead anywhere, I'm afraid. The DB API should
provide all the means it can to be able to write database
portable code.

> M.-A. Lemburg wrote:
> >...
> > The differentiation between date, time and timestamp is needed
> > simply because they point to three different informations. A single
> > date constructor + corresponding type object is not enough for
> > a module to decide whether to take only the date, the time, or
> > both parts to format some data to pass to the database.
> > 
> > Take e.g. a module that uses literal SQL statement formatting
> > for interfacing. Such a module would not be able to tell
> > whether to format a date as time only value, timestamp or
> > date only value. If it were to always pass in timestamp values,
> > the database would probably raise an error for DATE and TIME
> > columns.
> > 
> > So dropping the two constructors renders that kind of interface
> > useless. Becides, explicitly using the constructors in you
> > database interface script will surely make them more readable :-)
> Your constructors aren't sufficient anyhow(!).

Oh, they are w/r to what ANSI-SQL defines. The number types
can all be expressed with standard Python numbers, the character
types with Python strings and Binary() objects. All that was
left were the TIME and TIMESTAMP types.

I don't quite understand why you oppose having a bit more
variaty in the spec. It can all be coded in Python in a few
lines so there's really no big strain on the programmer.

> As I pointed out once
> before, MySQL has many more varieties than a simple DATE, TIME, and
> TIMESTAMP. Hell, the TIMESTAMP has about six varieties.
> However, I do believe that MySQL can always take a fully-qualified
> date/time and will trim it back as necessary. This implies a single
> constructor.
> What is the case for ODBC? Do you actually need to *bind* these
> differently? (which is the major reason to have these constructors --
> input binding!)

The main reason is to be ANSI-SQL conform. If the database
suppports other types (just have a look at Postgresql), these
are bound to be database specific extensions, yet all databases
that claim to support ANSI-SQL must provide these few basic types.

> > As for the type objects: we could group date/time values
> > collectively under DATETIME. Would that be in your interest ?
> This would certainly be better. I wouldn't really have a problem if we
> said that a "column type" of dbmodule.DATETIME existed. There may be
> many *Python* types which compare equal to that, but those subtypes are
> at the discretion of the module implementor.
> I also believe that if a particular database module requires additional
> granularity for *input* types, then it will have additional constructors
> beyond the 6-tuple and the ticks.

Sure, there's nothing to say against that. In fact, database
specific extensions should be welcomed. We're only trying to
settle on a basic set of features here and the guideline for this
should be ANSI-SQL and common experience with existing database

> A real problem that we're having here is that we don't have a sufficient
> review of the extensions that have been made for the different
> databases. What have module implementors needed to do to get their
> modules to work? (not just to operate nicer, but to *work*)

Oh, we have three module implementors on-line ;-)

> > > * the 'qmark' paramstyle should not be assumed. All 2.0 DBAPI modules
> > > must specify the paramstyle attribute (along with apilevel and
> > > threadsafety). Further, the 'qmark' style is not the best style. Named
> > > or positional are best, because they allow you to use the same parameter
> > > multiple times without requiring the user to respecify it over and over.
> > > For example: "select column_a1, column_b1 from a, b where a.key = :1 and
> > > b.key = :1". If anything, we should recommend not using the qmark style
> > > if a module implementor has a choice.
> > 
> > True. I'll take those comments out.
> I would like to see a stated preference for 'numeric', 'named', and/or
> 'pyformat'. They are marginally clearer, and easier to use.

Ok. I added a comment as footnote.

> > > * a note needs to be added to execute() and executemany() to state that
> > > parameters should not require any preprocessing before being passed. To
> > > the client programmer, the interface should appear as if a value is
> > > going to be "bound" to the input, rather than substituted into the query
> > > string.
> > 
> > Not sure, why you want this. The spec already says:
> > "Parameters may be provided as sequence or
> >  mapping and will be bound to variables in the operation."
> I see that, but people have still written substitution-based modules
> rather than binding-based modules. Can we have a footnote on the word
> bound, that reads: "The term "bound" refers to the process of binding an
> input value to a database execution buffer. In practical terms, this
> means that the input value is directly used as a value in the operation.
> The client should not be required to "escape" the value so that it can
> be used -- the value should be equal to the actual database value."

Ok. Also added as footnote.

> > > * I do not understand the difference between OperationalError and
> > > ProgrammingError. The examples seem to divide the two exceptions
> > > arbitrarily. What is the "rule" for deciding which is which?
> > 
> > OperationalErrors are all types of errors that are related
> > to things that are not necessarily under the control of the
> > programmer, e.g. lost connections, log full, transaction
> > could not be performed due to lack of memory or lack of
> > functionality, etc.
> > 
> > ProgrammingError refer to things that are under the programmers
> > control, e.g. he uses a table that does not exist, he specifies
> > the wrong number of parameters for a statement, uses wrong
> > SQL syntax, etc.
> All right. How about this: under "OperationalError" it states "data
> source name not found". Whose fault is that? The programmer for entering
> a typo'd DSN, or the administrators for not configuring it?
> Under ProgrammingError, it states one case is a "table not found." That
> isn't the programmer's fault cuz somebody deleted the table. Oh, but
> maybe it is because he misspelled the table name.
> Oh, and how is the InternalError different than OperationalError? Not my
> fault the cursor died on me.
> If I pass an integer that is too large, is that a ProgrammingError or a
> DataError?
> The differences here are too fine-grained to be usefully delineated in
> the specification. We should probably cut the number of exceptions in
> half.

No, but we should add another layer for DatabaseErrors as you
propose below (see exception layout above).

> > Hmm, maybe OperationalError wasn't the right choice for
> > the dynamic .rollback() error after all...
> > 
> > There is some overlap and in fact some DBs raise ProgrammingErrors
> > for certain things while others use OperationalError. Maybe we
> > should make one a subclass of the other... ?
> I'd say we make two subclasses of Error: InterfaceError, and
> DatabaseError.

See above.

> > > * I still do not understand the presence of the rowcount attribute.
> > > *Especially* since the execute() method now says "return value
> > > undefined." That's bogus. Let's nuke the rowcount attribute and return
> > > *that* value from execute. No reason to do an execute and then go
> > > somewhere else for this information.
> > 
> > The "return value undefined" was done to allow the .execute()
> > method return the old style return value (which is not easily
> > implementable for various reasons I already pointed out in
> > previous mails). The module implementor is free to return
> > whatever he likes.
> > 
> > The rowcount definition is somewhat different from the 1.0
> > return value spec of .execute(). That's why it's a new attribute.
> > Again, module implementors could go and simply have .execute()
> > return whatever the value of .rowcount would be if they like...
> If we make it undefined, then we are stating that in DBAPI 2.0, it
> cannot be used (without consulting the db-specific doc). In other words:
> somebody has to update their client program.

That depends on the module implementation. I think this part of
the spec is much like the connect() part: it is deemed to be
database dependent.

Note that a return value of "don't know" would have to be possible
for .executeXXX() otherwise and this would of course render the
return value useless, because if the method can return this value,
the database script would still not be able to rely on the return value
being meaningful.

> If they are going to update their program, then let's make execute()
> return something rational.
> > > * for the execute() return value (merged from the rowcount thing), we
> > > should discriminate between "I know the number and it is X" and "I have
> > > no idea". The latter should be None. Note that -1 no longer applies
> > > since the value is only available as part of an execute, rather than a
> > > random-access. I agree with your earlier comments about needing to
> > > remove the bit about detecting DML vs DDL vs DQL. I like your current
> > > text, with the additions of "0 or None for DDL statements" and "None for
> > > cases where the count cannot be determined".
> > 
> > The problem with None is that it compares false which would
> > be ok for DDL (0 or None both are false), but fails to indicate
> > "don't know" for DQL where 0 could be a possible known row count.
> If people are looking for None, then they can use "is None" rather than
> "not".
> true-ness / false-ness is obviously not the appropriate way to
> discriminate the values.
> result = curs.execute(op)
> if result is None:
>   # database doesn't know how many rows were returned/affected
> elif result == 0:
>   # no rows returned/affected
> else:
>   # some positive number of rows were returned/affected
> For DDL: a database will return 0 or None (whichever is convenient)
> For DML: a database will return >=0 or None (if it can't determine the
> number of rows affected... at one time, PostGres couldn't)
> For DQL: a database will return >=0 or None (if it can't determine the
> number of rows returned until a fetchXXX() is performed)
> A footnote with the above three lines would be a handy "implementation
> hint" for developers. I believe it also fits in with your need to not be
> required to parse the statement (which I agree is something to avoid!).

But what use would this definition have: my module could always
return None and be in sync with the spec. Still, the return value
would have no value for a user of my module.
> > As for the random-access vs. return value: some DBs could be
> > able to calculate the row count only if at least one fetch was
> > done. So the information might not be available right after the
> > execute.
> > 
> > How about defining the return value .execute() as being the
> > current .rowcount attribute's value as you propose and provide
> > the .rowcount as additional, possibly optional feature ?
> Drop rowcount altogether. We don't need multiple ways to access the
> value.

As I said: it's not just another way to get that information.
The rowcount could become available after a .fecthXXX() call,
so it's dynamic... hmm, maybe it should be turned into a

> > > * the comment in fetchone() about the underlying cursor should be
> > > removed. The Python programmer can never access that thing, so the
> > > comment is irrelevant. The comment about using array fetches and whatnot
> > > still apply -- in particular, the possible use of the arraysize
> > > attribute.
> > 
> > The comment is simply a warning to those who might expect the
> > fetchone() method to move the cursor by one row. This may be
> > true for some interfaces (e.g. mxODBC) while being false for others.
> Understood, but why does the Python programmer care about this? It has
> no bearing on the interface since they have NO access to that cursor.
> The fact that it does or does not move forward N rows is moot -- the
> Python programmer only uses the DBAPI interface which has no "movement"
> semantics.
> Could this note be moved under the implementation hints?

Ok. It's a footnote now.

> > > * on nextset(), it might be nice to define a return value that allows
> > > the user to discriminate these types of results: no more sets (None
> > > right now); another set with N rows; another set with unknown number of
> > > rows. The only thing really available for the last one is a negative
> > > number (e.g. -1), but that seems a bit hokey. Any ideas?
> > 
> > I've got no experience with multiple result sets, but do that there
> > was a point to define the return value in the fuzzy way it is
> > written in the spec.
> I think it would be useful to have a specified return value, much like
> we have it for execute(). Strictly speaking, a nextset() is much like
> execute, but meaning "give me the next part of my operation" ... the
> client wants to know the results.
> Can we mark this as an open issue? Maybe somebody will have a good idea.

Ok, I'll add it to the open issues section.

> Ooh. I have a proposal: maybe we can have a global named unknownCount.
> This can be returned by execute() and by nextset(). In both cases, it
> means "I have no idea". Client programmers should test with an "is"
> rather than equivalence.  Oh, actually, if the underlying type does not
> specify a cmp() function, then a "==" compares the type name... this an
> "==" will work for client programmers, too, since the only other return
> value is a number. (would we still have None as acceptable? ... oh, for
> nextset(), yes).

Ehm,... have you had too much coffee ;-) 

What's so bad about -1, BTW ?

> New comment: should we change the globals to be: apiLevel, threadSafety,
> and paramStyle? Or maybe underscore-separated? Or leave them?  hmm... I
> think "Python style" says all lower-case for these, mixed signal on the
> underscore. Thoughts?

All the APIs use lower-case and no underscores, so I guess that's
not in sync with the rest. [Not that I like this naming scheme, but
it's the way it's been in Python for ages.]

Uff, made it... ;-)

Marc-Andre Lemburg                               Y2000: 274 days left
          : Python Pages >>> http://starship.skyport.net/~lemburg/  :