[DB-SIG] annotated 1.1 spec / feedback

Greg Stein gstein@lyra.org
Fri, 12 Mar 1999 03:52:09 -0800


M.-A. Lemburg wrote:
>...
> > While annotating, I've also prepared some feedback items:
> >
> > * Connect() should be lower-cased to connect(). Most Python module-level
> > functions are all lower case; we should follow suit.
> 
> Hmm, this is probably a religous question: I tend to always use names
> starting with a capital letter for classes and constructors.

As a reference for the "Python-way", I looked at anydbm.py. The open()
method returns a new object, and it is lower-cased.

Since the spec does not state that connect() is a class name, then IMO
it should be lower-cased. If we said it was a class, then I'd agree with
the upper-case.

Moot anyhow, as I see you've lower-cased it in the current spec. thx.

> > * the parameters should be positional, rather than keyword-based.
> > keywords can be optional, but we should not require the params to be
> > keyword-based.
> 
> But keywords are so much simpler to maintain: the order doesn't
> matter, only the wording. They also serve as nice way to document
> what the parameter stands for, e.g.
> 
> connect('degas','miro','kline')
> 
> isn't too easy to comprehend without first looking at the interface's
> docs, yet
> 
> connect(host='degas',user='miro',password='kline')
> 
> is.
> 
> If you fear the programming overhead, I can post a few C macros
> that make dealing with keywords in C really simple.

I believe we can only require keywords if we prepare an appendix to the
specification. As a module implementor, I do not do keywords... I always
use positional because they are much simpler for C-implemented modules.

If we do not have a simple reference, then I believe we cannot require
keywords. Even then, I am hesitant to do so.

[and yes, I fully understand their benefit for the user of the module;
let the user fix it in Python if it bugs them]

>...
> > * maybe we should discuss punting dbi. module distribution may be easier
> > without it. While it is handy for clients that use more than one DB
> > module,
> 
> Actually, it makes things harder for those cases (see above). mxODBC
> supports opening connections to many different databases and each
> has its own version of the dbi module because each subpackage uses
> its own set of exception classes.

If we tossed the dbi module, then each database would incorporate its
dbi into the module itself. I don't understant what the problem is here.

I'm definitely leaning towards axing dbi.

> > I can easily envision a python module that exports the
> > appropriate set for the DB being used.
> 
> Right: stick it all into the database interface package.

I'd like to see:

import mysqldb
...
  except mysqldb.IntegrityError:
...
  if curs.description[0][0] == mysqldb.STRING:
...

> > * cursor.rowcount is new, and is extraneous. The result of execute()
> > should be used.
> 
> The execute() method returns None in case there are rows available.
> It does not return the number of rows in the result set.

I am not sure that all databases can provide you the rowcount after an
execute(). You may need to fetch them first. Therefore, the .rowcount
couldn't even be implemented. This means that the only real way to know
how many rows there are is to fetch them -- rowcount could not be relied
upon.

> > * I think we should clean up the description attribute if possible. The
> > 7-tuple is/was very Oracle-specific.
> 
> Ok, things like display_size and internal_size are not really
> relevant, but why break existing code expecting these 7 entries (even
> if it probably does not use all of them) ?

I believe we should alter the specification, but that databases can
continue to return an appropriate number of fields.

For example: can we require two fields -- name and type. Beyond that, it
is up to the database module. Implementors can choose to continue with
the 7-tuple or bring it down to what they really support.

> > * The old description for execute() returning None was clearer. Note
> > that a SELECT can return no columns/rows (where your description says >
> > 0).
> 
> The None return value is to indicate that data is available
> to be fetched vie the fetchXXX() methods. If no rows where produced,
> something else should be returned, e.g. 1.

I do not believe that a database can always tell that rows were or were
not produced. You may need to fetch them first. Therefore, returning 1
is not possible for certain modules.

Okay. I just looked at DCOracle. The execute() method does just that...
an execute. No fetching, so no way to tell what the row count is.

Admittedly, there may be something in the OCI interface, or in the V8
Oracle interfaces to do so. Unless sometime can point out that the
Oracle interfaces support knowing the row count at execute() time, then
we cannot rely on your .rowcount and return-value changes.

Even so, I do not like seeing a SELECT returning "1" *ever*. That just
doesn't fit right.

> The reason I changed the description was that the previous one
> wasn't easy to implement: it is not clear whether a statement
> is DDL, DML or DQL without parsing it...

Untrue. It is easy to implement -- you can always tell from the first
word in all cases.

> > * in general, I think the 1.1 change to the return value description
> > didn't really clear things up. Basing the descriptions on returning
> > columns/rows doesn't make much sense to me. Instead, I liked the
> > description based on the *type* of statement that was performed. Under
> > this style, a SELECT that doesn't produce columns/rows would return 1,
> > but that is just wrong.
> 
> See above. Without parsing there isn't much you can do as module
> implementor to get the return value right. Maybe we should drop the
> return value altogether: rowcount gives the number of rows in the
> result set (even 0 for SELECTs that didn't produce any rows) or the
> number of affected rows.

As I mentioned, I think the rowcount is not possible for all databases
that use a pure execute/fetch model like Oracle.

> > * execute()'s description should change back to the terms "sequence" and
> > "sequence of sequences" for the parameters. There is no reason to force
> > them to be tuples or lists.
> 
> I thought about this: there is no way for the method to tell whether
> it was passed a sequence of sequences that is meant to be a
> sequence of parameter sequences or just a single sequence with
> sequence entries, e.g. ('abc',) would break under that definition.
> 
> Is there really such a big need to be able to pass *any* sequence
> as parameter ?

Yes.

What if you have a very large dataset which is encapsulated within a
class instance? Let's say it responds properly to the sequence behavior,
specifically so that it can be inserted without having to construct a
big ol' list to do so. I also can easily see rows that identified by
class instances or other builtin types, rather than always being tuples
or lists.

It certainly does hurt the implementor to use PySequence_GetItem()
rather than PyTuple_GetItem().

An alternative would be an "insert" method that specifically takes a
sequences of rows (where each row is a sequence). This would eliminate
the confusion between a sequence-of-sequence being a single row or a set
of rows.

I believe the INSERT is the only case where a sequence input makes
sense.

> > * the part about cursor movement in fetchone() should be totally
> > removed. fetchone() is about the CLIENT fetching a single row. It should
> > NOT affect how the underlying module performs its fetches.
> 
> That renders cursors useless... are you sure you want this ?

Absolutely. Positively. There is no way that we can or should require
single-row fetching.

Look at DCOracle. It uses the API nearly to its full extent (it doesn't
use input or output sizes to optimize its binding, though). Requiring
fetchone() to disable the array fetching would be a mistake, let alone
the fact that when you execute() the array-fetch there is no way to know
whether the person will be using fetchone or not!

> > * the fetch* methods should say they return a sequence or a sequence of
> > sequences. We should not force them to be tuples or lists of tuples.
> > Some DB modules may want to have additional semantics or capabilities on
> > their results. As long as it obeys sequence behavior, then the spec
> > should be happy. The arraysize attribute's description also refers to a
> > list of tuples, and should be updated.
> 
> Just thought it would be a good idea to keep execute() parameter
> input and fetchXXX() output in sync.

Right. And I think they should all sync up to be a
sequence-of-sequences. I can see an implementor wanting to create a
high-performance binding against a database that serves up values
directly from the output buffers rather than jamming them into lists of
tuples (with the requisite allocations).

Jim Fulton has argued for a long while that the module should be able to
return a highly functional result set. As long as it obeys the
sequence-of-sequence behavior, then he could now do so, yet still
"follow the spec".

thx!
-g

--
Greg Stein, http://www.lyra.org/