[DB-SIG] Remaining issues with DB API 1.1

Greg Stein gstein@lyra.org
Sun, 28 Mar 1999 02:26:34 -0800

M.-A. Lemburg wrote:
> Hi everybody,
> According to the proposed schedule for the DB API, today (1999-03-26)
> is the Final Call deadline. We made some very good progress so far and I
> would like to thank everybody who joined in.

I believe we're getting close. Let's not publish too early -- dates are
great to get things moving and we're seeing that, but we don't have any
obligation to stick to it as long as we continue to see forward progress
and good discussion. I'd say let's simply wrap up the effort this week.
We certainly aren't seeing a lot of new people/issues jumping in lately.

Can we state that discussion will continue this week, producing a final
edit on Friday, April 2nd? Publish on the 5th?

>         Should we add additional constructor signatures to the API
>          spec ? E.g. for monetary values (not defined in standard SQL
>          btw).
> I suggest postponing this decision to the next spec release.

And I have suggested that we remove the DATETIME and TIMESTAMP
constructors. I still do not understand why we have the difference.
There are only a few general types of data: STRING, BLOB, DATE, NUMBER,
and ROWID.

DATETIME and TIMESTAMP are simply specializations of a generic DATE
type. They should be handled using the "==" thing for types that you
introduced. Module implementors can extend the set of constructors if
they'd like, but the baseline spec should be narrow and provide a
constructor that takes the standard six parameters: year, month, day,
hour, minute, and second.

>         Are setinputsize, setoutputsize optional in the sense that the
>          APIs itself may be left out by the module implementor or rather
>          that they must be provided but may be implemented with void
>          functionality.
> I guess they should always be available.

[ and it is in the spec like this ]

>         Naming of some of the values for the .paramstyle attribute:
>          Andy Dustman proposed to change "percent" and "xpercent"
>          to "format" and "dictformat". I prefer "pyformat" instead of
>          "dictformat"...

format and pyformat seem fine.
[ and it is in the spec like this ]

The part about multiple paramstyles should not be included. A database
should take just one style and be done with it. Higher levels can
perform the appropriate mappings.

>         Should we extend the notion of "sequences" in .execute()
>          to "objects providing __len__ and __getitem__ methods" ?
>          See my reply to Andy Dustman for a detailed explanation
>          of the benefits.

This seems fine, but I would leave the description of execute() and
executemany() as taking a sequence or a dictionary, with a reference to
a footnote. In the footnote, state that an implementation should limit
its use of the sequence/mapping interface to getitem and length

>         Add a section pointing out the changes from 1.0 to 1.1 and ways
>          to port existing scripts to the updated API spec.
> Could someone help with this one ?!

Sure. We should probably rename the spec to 2.0 because of the big
changes. (1) it is incompatible in a number of ways, and (2) we flat-out
dropped "dbi" from the spec.


I have some additional comments on the spec from a final read-thru:

* the connect() method appears to state "definitive" parameters. It
should NOT do that, as I described before. For example, it seems to
imply that a parameter named "dsn" is required. That is not the case for
most databases. While your keyword code for C modules is fine, why can't
we simply state that the connect() method takes an appropriate number of
parameters (possibly keyword-based) which allows a connection to be
made? I don't believe we need to legislate keyword params or anything
like that. We don't anywhere else, so let's not do it here. For example:
a DBM-style database is going to take a single parameter: a filename. It
doesn't need a keyword, and it doesn't match the ones that you list in
the doc.

* if you want to provide an example, then place it into a
footnote/appendix and reference it from the connect() method.

* the 'qmark' paramstyle should not be assumed. All 2.0 DBAPI modules
must specify the paramstyle attribute (along with apilevel and
threadsafety). Further, the 'qmark' style is not the best style. Named
or positional are best, because they allow you to use the same parameter
multiple times without requiring the user to respecify it over and over.
For example: "select column_a1, column_b1 from a, b where a.key = :1 and
b.key = :1". If anything, we should recommend not using the qmark style
if a module implementor has a choice.

* the comment in the paramstyle text about 'named' and 'xpercent' no
longer applies with the recent change to execute().

* a note needs to be added to execute() and executemany() to state that
parameters should not require any preprocessing before being passed. To
the client programmer, the interface should appear as if a value is
going to be "bound" to the input, rather than substituted into the query

* the apilevel should move up to the top, along with a reference to the
1.0 spec. "if this constant is not specified, then a 1.0 DBAPI should be
assumed. see http://... for more information."  (we'll leave the old
spec at a URL that includes its version number; same for this spec; the
old DatabaseAPI.html document can be a page listing each version with a
provided URL)

* I do not understand the difference between OperationalError and
ProgrammingError. The examples seem to divide the two exceptions
arbitrarily. What is the "rule" for deciding which is which?

* I'm not sure why we have the part about named cursors in the spec,
since it specifically states they are not part of the spec. Could we
move that to an appendix that lists ways to extend the functionality,
but which have not been fully specified (yet) ?

* I still do not understand the presence of the rowcount attribute.
*Especially* since the execute() method now says "return value
undefined." That's bogus. Let's nuke the rowcount attribute and return
*that* value from execute. No reason to do an execute and then go
somewhere else for this information.

* for the execute() return value (merged from the rowcount thing), we
should discriminate between "I know the number and it is X" and "I have
no idea". The latter should be None. Note that -1 no longer applies
since the value is only available as part of an execute, rather than a
random-access. I agree with your earlier comments about needing to
remove the bit about detecting DML vs DDL vs DQL. I like your current
text, with the additions of "0 or None for DDL statements" and "None for
cases where the count cannot be determined".

* the comment in fetchone() about the underlying cursor should be
removed. The Python programmer can never access that thing, so the
comment is irrelevant. The comment about using array fetches and whatnot
still apply -- in particular, the possible use of the arraysize

* in all methods that state "an exception is raised..." we should state
*which* exception. We have a complete set defined at the beginning, so
this makes sense to do. I would recommend InterfaceError (it is based on
the interface we provide, rather than the underlying database). Hrm.
Maybe not. I can see a case where the DB module will simply attempt to
call the underlying DB function and generate a DB-related exception.
Thoughts anyone?

* on nextset(), it might be nice to define a return value that allows
the user to discriminate these types of results: no more sets (None
right now); another set with N rows; another set with unknown number of
rows. The only thing really available for the last one is a negative
number (e.g. -1), but that seems a bit hokey. Any ideas?

* on setinputsizes() and setoutputsizes(), the text kind of implies that
an implementation may omit it. Maybe it could read, "Implementors are
free to have this method do nothing, and users are free to not use it."

That's all :-)


Greg Stein, http://www.lyra.org/