[DB-SIG] Remaining issues with DB API 2.0

M.-A. Lemburg mal@lemburg.com
Sun, 28 Mar 1999 16:59:07 +0200

Note: The URL for the API spec has changed:


Greg Stein wrote:
> M.-A. Lemburg wrote:
> >
> > Hi everybody,
> >
> > According to the proposed schedule for the DB API, today (1999-03-26)
> > is the Final Call deadline. We made some very good progress so far and I
> > would like to thank everybody who joined in.
> I believe we're getting close. Let's not publish too early -- dates are
> great to get things moving and we're seeing that, but we don't have any
> obligation to stick to it as long as we continue to see forward progress
> and good discussion. I'd say let's simply wrap up the effort this week.
> We certainly aren't seeing a lot of new people/issues jumping in lately.
> Can we state that discussion will continue this week, producing a final
> edit on Friday, April 2nd? Publish on the 5th?


> >...
> >         Should we add additional constructor signatures to the API
> >          spec ? E.g. for monetary values (not defined in standard SQL
> >          btw).
> >
> > I suggest postponing this decision to the next spec release.
> And I have suggested that we remove the DATETIME and TIMESTAMP
> constructors. I still do not understand why we have the difference.
> There are only a few general types of data: STRING, BLOB, DATE, NUMBER,
> and ROWID.
> DATETIME and TIMESTAMP are simply specializations of a generic DATE
> type. They should be handled using the "==" thing for types that you
> introduced. Module implementors can extend the set of constructors if
> they'd like, but the baseline spec should be narrow and provide a
> constructor that takes the standard six parameters: year, month, day,
> hour, minute, and second.

The differentiation between date, time and timestamp is needed
simply because they point to three different informations. A single
date constructor + corresponding type object is not enough for
a module to decide whether to take only the date, the time, or
both parts to format some data to pass to the database.

Take e.g. a module that uses literal SQL statement formatting
for interfacing. Such a module would not be able to tell
whether to format a date as time only value, timestamp or
date only value. If it were to always pass in timestamp values,
the database would probably raise an error for DATE and TIME

So dropping the two constructors renders that kind of interface
useless. Becides, explicitly using the constructors in you
database interface script will surely make them more readable :-)

As for the type objects: we could group date/time values
collectively under DATETIME. Would that be in your interest ?

> >         Naming of some of the values for the .paramstyle attribute:
> >          Andy Dustman proposed to change "percent" and "xpercent"
> >          to "format" and "dictformat". I prefer "pyformat" instead of
> >          "dictformat"...
> format and pyformat seem fine.
> [ and it is in the spec like this ]
> The part about multiple paramstyles should not be included. A database
> should take just one style and be done with it. Higher levels can
> perform the appropriate mappings.

Andy ?

> >         Should we extend the notion of "sequences" in .execute()
> >          to "objects providing __len__ and __getitem__ methods" ?
> >          See my reply to Andy Dustman for a detailed explanation
> >          of the benefits.
> This seems fine, but I would leave the description of execute() and
> executemany() as taking a sequence or a dictionary, with a reference to
> a footnote. In the footnote, state that an implementation should limit
> its use of the sequence/mapping interface to getitem and length
> functionality.

That's a document formatting question. I'll look into it.

> >         Add a section pointing out the changes from 1.0 to 1.1 and ways
> >          to port existing scripts to the updated API spec.
> >
> > Could someone help with this one ?!
> Sure. We should probably rename the spec to 2.0 because of the big
> changes. (1) it is incompatible in a number of ways, and (2) we flat-out
> dropped "dbi" from the spec.

I guess you're right. So the next spec update will be versioned

> -----
> I have some additional comments on the spec from a final read-thru:
> * the connect() method appears to state "definitive" parameters. It
> should NOT do that, as I described before. For example, it seems to
> imply that a parameter named "dsn" is required. That is not the case for
> most databases. While your keyword code for C modules is fine, why can't
> we simply state that the connect() method takes an appropriate number of
> parameters (possibly keyword-based) which allows a connection to be
> made? I don't believe we need to legislate keyword params or anything
> like that. We don't anywhere else, so let's not do it here. For example:
> a DBM-style database is going to take a single parameter: a filename. It
> doesn't need a keyword, and it doesn't match the ones that you list in
> the doc.

Hmm, I guess you've got a point there. I'll make the parameter
list a guideline then (this part of the spec isn't going to be
portable anyway...). Still, I'll leave the keyword stuff in --
people should start using keyword based functions in C more

> * if you want to provide an example, then place it into a
> footnote/appendix and reference it from the connect() method.


> * the 'qmark' paramstyle should not be assumed. All 2.0 DBAPI modules
> must specify the paramstyle attribute (along with apilevel and
> threadsafety). Further, the 'qmark' style is not the best style. Named
> or positional are best, because they allow you to use the same parameter
> multiple times without requiring the user to respecify it over and over.
> For example: "select column_a1, column_b1 from a, b where a.key = :1 and
> b.key = :1". If anything, we should recommend not using the qmark style
> if a module implementor has a choice.

True. I'll take those comments out.

> * the comment in the paramstyle text about 'named' and 'xpercent' no
> longer applies with the recent change to execute().


> * a note needs to be added to execute() and executemany() to state that
> parameters should not require any preprocessing before being passed. To
> the client programmer, the interface should appear as if a value is
> going to be "bound" to the input, rather than substituted into the query
> string.

Not sure, why you want this. The spec already says:
"Parameters may be provided as sequence or
 mapping and will be bound to variables in the operation."

> * the apilevel should move up to the top, along with a reference to the
> 1.0 spec. "if this constant is not specified, then a 1.0 DBAPI should be
> assumed. see http://... for more information."  (we'll leave the old
> spec at a URL that includes its version number; same for this spec; the
> old DatabaseAPI.html document can be a page listing each version with a
> provided URL)


> * I do not understand the difference between OperationalError and
> ProgrammingError. The examples seem to divide the two exceptions
> arbitrarily. What is the "rule" for deciding which is which?

OperationalErrors are all types of errors that are related
to things that are not necessarily under the control of the
programmer, e.g. lost connections, log full, transaction
could not be performed due to lack of memory or lack of
functionality, etc.

ProgrammingError refer to things that are under the programmers
control, e.g. he uses a table that does not exist, he specifies
the wrong number of parameters for a statement, uses wrong
SQL syntax, etc.

Hmm, maybe OperationalError wasn't the right choice for
the dynamic .rollback() error after all...

There is some overlap and in fact some DBs raise ProgrammingErrors
for certain things while others use OperationalError. Maybe we
should make one a subclass of the other... ?

> * I'm not sure why we have the part about named cursors in the spec,
> since it specifically states they are not part of the spec. Could we
> move that to an appendix that lists ways to extend the functionality,
> but which have not been fully specified (yet) ?

I turned it into a footnote.

> * I still do not understand the presence of the rowcount attribute.
> *Especially* since the execute() method now says "return value
> undefined." That's bogus. Let's nuke the rowcount attribute and return
> *that* value from execute. No reason to do an execute and then go
> somewhere else for this information.

The "return value undefined" was done to allow the .execute()
method return the old style return value (which is not easily
implementable for various reasons I already pointed out in
previous mails). The module implementor is free to return
whatever he likes.

The rowcount definition is somewhat different from the 1.0
return value spec of .execute(). That's why it's a new attribute.
Again, module implementors could go and simply have .execute()
return whatever the value of .rowcount would be if they like...

> * for the execute() return value (merged from the rowcount thing), we
> should discriminate between "I know the number and it is X" and "I have
> no idea". The latter should be None. Note that -1 no longer applies
> since the value is only available as part of an execute, rather than a
> random-access. I agree with your earlier comments about needing to
> remove the bit about detecting DML vs DDL vs DQL. I like your current
> text, with the additions of "0 or None for DDL statements" and "None for
> cases where the count cannot be determined".

The problem with None is that it compares false which would
be ok for DDL (0 or None both are false), but fails to indicate
"don't know" for DQL where 0 could be a possible known row count.

As for the random-access vs. return value: some DBs could be
able to calculate the row count only if at least one fetch was
done. So the information might not be available right after the

How about defining the return value .execute() as being the
current .rowcount attribute's value as you propose and provide
the .rowcount as additional, possibly optional feature ?

> * the comment in fetchone() about the underlying cursor should be
> removed. The Python programmer can never access that thing, so the
> comment is irrelevant. The comment about using array fetches and whatnot
> still apply -- in particular, the possible use of the arraysize
> attribute.

The comment is simply a warning to those who might expect the
fetchone() method to move the cursor by one row. This may be
true for some interfaces (e.g. mxODBC) while being false for others.

> * in all methods that state "an exception is raised..." we should state
> *which* exception. We have a complete set defined at the beginning, so
> this makes sense to do. I would recommend InterfaceError (it is based on
> the interface we provide, rather than the underlying database). Hrm.
> Maybe not. I can see a case where the DB module will simply attempt to
> call the underlying DB function and generate a DB-related exception.
> Thoughts anyone?

This is hard to do, since the types of errors may very well be
the whole range of possible Python exceptions + the ones defined
in the DB API, e.g. a conversion routine might raise an OverflowError
in case of numeric overflow while raising an InterfaceError in
case a certain internal assertion failed.

> * on nextset(), it might be nice to define a return value that allows
> the user to discriminate these types of results: no more sets (None
> right now); another set with N rows; another set with unknown number of
> rows. The only thing really available for the last one is a negative
> number (e.g. -1), but that seems a bit hokey. Any ideas?

I've got no experience with multiple result sets, but do that there
was a point to define the return value in the fuzzy way it is
written in the spec.

> * on setinputsizes() and setoutputsizes(), the text kind of implies that
> an implementation may omit it. Maybe it could read, "Implementors are
> free to have this method do nothing, and users are free to not use it."


Marc-Andre Lemburg                               Y2000: 278 days left
          : Python Pages >>> http://starship.skyport.net/~lemburg/  :