[DB-SIG] DB-API 1.1

Jim Fulton jim.fulton@Digicool.com
Wed, 03 Jun 1998 18:36:51 -0400


M.-A. Lemburg wrote:
> 
> M.-A. Lemburg wrote:
> >
> > I'd like to start discussing updating/extending the DB-API 1.0 to
> > a new version. IMHO, this needs to be done to
> >         a. clearify a few minor tidbits
> >         b. enable a more informative exception mechanism
> >
> > You can have a look at an alpha version of 1.1 at:
> >         http://starship.skyport.net/~lemburg/DatabaseAPI-1.1.html
> 
> Looks like everybody is too busy ... oh well. I'll wait another
> week or so then and repost the RFC.

I applaud you for taking this on....

Here are some comments.  I don't know what has changed since
1.0, so some things I gripe about may be inherited from 1.0.

I didn't comment enough on the 1.0 spec, so I'll try to make up 
for it now.

> 
> 
> Module Interface
> 
> The database interface modules should typically be named with
> something terminated by db.

Why?

> Existing examples are: oracledb,
> informixdb, and pg95db. These modules should export several names:
> 
> modulename(connection_string_or_tuple) 

Why use the module name?  Why not some descriptive name 
like: 'Connect'?
 
>      Constructor for creating a connection to the database. Returns a
>      Connection Object. In case a connection tuple is used, it should
>      follow this convention: (data_source_name, user, password).

Why allow a string or a tuple?  Doesn't this add non-portability?
 
> error 
> 
>      Exception raised for errors in the the database module's internal
>      processing. Other errors may also be raised. Database error
>      should in general be made accessible through the exceptions
>      defined for the dbi abstraction module described below.

Maybe this should be InternalError.  Is this a DBI defined error, 
or is it local to the module?  Does it subclass from the DBI.Error
defined below?
 
> Connection Objects
> 
> Connections Objects should respond to the following methods: 
> 
> close() 
> 
>      Close the connection now (rather than whenever __del__ is
>      called). The connection will be unusable from this point forward;
>      an exception will be raised if any operation is attempted with
>      the connection.
> 
> commit() 
> 
>      Commit any pending transaction to the database. Note that if the
>      database supports an auto-commit feature, this must be initially
>      off. An interface method may be provided to turn it back on.
> 
> rollback() 
> 
>      Roll the database back to the start of any pending
>      transaction. Note that closing a connection without committing
>      the changes first will cause an implicit rollback to be
>      performed.

Why not have a begin() method?
 
> cursor() 
> 
>      Return a new Cursor Object. An exception may be thrown if the
>      database does not support a cursor concept.
> 
> callproc([params]) 
> 
>      Note: this method is not well-defined yet.  Call a stored
>      database procedure with the given (optional) parameters. Returns
>      the result of the stored procedure.

How are IN OUT and OUT parameters handled?
How common are stored procedures across database products 
anyway?
 
> all Cursor Object attributes and methods 
> 
>      For databases that do not have cursors and for simple
>      applications that do not require the complexity of a cursor, a
>      Connection Object should respond to each of the attributes and
>      methods of the Cursor Object. Databases that have cursor can
>      implement this by using an implicit, internal cursor.
> 
> Cursor Objects
> 
> These objects represent a database cursor, which is used to
> manage the context of a fetch operation. 
> 
> Cursor Objects should respond to the following methods and attributes: 
> 
> arraysize 
> 
>      This read/write attribute specifies the number of rows to fetch
>      at a time with fetchmany(). This value is also used when
>      inserting multiple rows at a time (passing a tuple/list of
>      tuples/lists as the params value to execute()). This attribute
>      will default to a single row.
> 
>      Note that the arraysize is optional and is merely provided for
>      higher performance database interactions. Implementations should
>      observe it with respect to the fetchmany() method, but are free
>      to interact with the database a single row at a time.

Doesn't fetchmany accept a count?  Why is this attribute
needed?
 
> description 
> 
>      This read-only attribute is a tuple of 7-tuples. Each 7-tuple
>      contains information describing each result column: (name,
>      type_code, display_size, internal_size, precision, scale,
>      null_ok). This attribute will be None for operations that do not
>      return rows or if the cursor has not had an operation invoked via
>      the execute() method yet.
> 
>      The type_code is equal to one of the dbi type objects specified
>      in the section below.
> 
>      Note: this is a bit in flux. Generally, the first two items of
>      the 7-tuple will always be present; the others may be database
>      specific.

This is bad.  I suppose we are stuck with this for backwards
compatibility.

If I were designing this interface I would have description
be a collection object that acted as both a sequence of
column definitions and a mapping from column name to column 
definitions.  I would have the column definitions be objects
that have methods for standard attributes like name, and type
(and maybe nullability, scale and precision)
as well as optional attributes for things like display size and
internal size.

I suppose that with some trickery, this could be handled in a mostly
backward compatible way, by making column-definitions sequence objects
too.

> close() 
> 
>      Close the cursor now (rather than whenever __del__ is
>      called). The cursor will be unusable from this point forward; an
>      exception will be raised if any operation is attempted with the
>      cursor.
> 
> execute(operation [,params]) 
> 
>      Execute (prepare) a database operation (query or
>      command). Parameters may be provided (as a sequence
>      (e.g. tuple/list)) and will be bound to variables in the
>      operation. Variables are specified in a database-specific
>      notation (some DBs use ?,?,? to indicate parameters, others
>      :1,:2,:3) that is based on the index in the parameter tuple
>      (position-based rather than name-based).

The format of parameter references should be standardized.
Maybe with something more Pythonic, like:

  %s, %d, %f

This might also allow type information to be captured to 
aid in binding variables.
 
>      The parameters may also be specified as a sequence of sequences
>      (e.g. a list of tuples) to insert multiple rows in a single
>      operation.

Does this run the insert multiple times, or does it bind
some sorts of arrays to input parameters?  Is this
useful enough to include in this standard?  It feels like
alot of extra burden for DBI interface developers.

>      A reference to the operation will be retained by the cursor. If
>      the same operation object is passed in again, then the cursor can
>      optimize its behavior. This is most effective for algorithms
>      where the same operation is used, but different parameters are
>      bound to it (many times).

This sounds a bit too magical to me.  Does it apply when no arguments
are presented?  I'd rather see an explicit prepare method, preferably
on a connection object that returns a callable object, as in:


  f=aConnection.prepare(
      "select * from mydata where id=%d and name=%s")
  ...
  x=f(1, 'foo')
  ...
  y=f(2, 'bar')

 
>      For maximum efficiency when reusing an operation, it is best to
>      use the setinputsizes() method to specify the parameter types and
>      sizes ahead of time. It is legal for a parameter to not match the
>      predefined information; the implementation should compensate,
>      possibly with a loss of efficiency.

I think that this could better be handled using more pythonic 
place holders.  I don't like having to specify sizes for strings, 
since I may want to use a type like 'long var binary' that effectively
doesn't have an upper limit.
 
>      Using SQL terminology, these are the possible result values from
>      the execute() method:
> 
>           If the statement is DDL (e.g. CREATE TABLE), then 1 is
>           returned.

This seems a bit arbitrary to me.  
 
>           If the statement is DML (e.g. UPDATE or INSERT), then the
>           number of rows affected is returned (0 or a positive
>           integer).
> 
>           If the statement is DQL (e.g. SELECT), None is returned,
>           indicating that the statement is not really complete until
>           you use one of the fetch methods.
> 
> fetchone() 
> 
>      Fetch the next row of a query result, returning a single tuple,
>      or None when no more data is available.
> 
> fetchmany([size]) 
> 
>      Fetch the next set of rows of a query result, returning as a list
>      of tuples. An empty list is returned when no more rows are
>      available. The number of rows to fetch is specified by the
>      parameter. If it is None, then the cursor's arraysize determines
>      the number of rows to be fetched.
> 
>      Note there are performance considerations involved with the size
>      parameter. For optimal performance, it is usually best to use the
>      arraysize attribute. If the size parameter is used, then it is
>      best for it to retain the same value from one fetchmany() call to
>      the next.
> 
> fetchall() 
> 
>      Fetch all (remaining) rows of a query result, returning them as a
>      list of tuples. Note that the cursor's arraysize attribute can
>      affect the performance of this operation.

For the record, I've never liked this approach.  When I've done this
sort of thing before (For Ingres and Info, sorry, I can't share the
code, it was done while at USGS), I had selects return "result"
objects.  Result objects encapsulated cursors and have them sequence
behavior. As in:

  rows=aConnection.execute('select * from blech')
  # (Note, no explicit cursor objects)
  for row in rows:
     ... do something with the rows

Note that the "rows" object in this example is not a tuple
or list, but an object that lazily gets rows from the
result as needed.

Also note that the individual rows are not tuples, but 
objects that act as a sequence of values *and* as a mapping
from column name to value.  This lets you do something like:

  rows=aConnection.execute('select name, id from blech')
  for row in rows:
     print "%(name), %(id)" % row

In my Ingres and Info interfaces, I also had the
rows have attributes (e.g. aRow.name), but then it's
hard for rows to have generic methods, like 'keys' and
'items'.  I also provided access to meta data for rows, 
something like:

  row.__schema__
 
> nextset() 
> 
>      If the database supports returning multiple result sets, this
>      method will make the cursor skip to the next available set. If
>      there are no more sets, the method returns None. Otherwise, it
>      returns 1 and subsequent calls to the fetch methods will return
>      rows from the next result set. Database interface modules that
>      don't support this feature should always return None.

This feels a bit cumbersome to me.  What happens if you need
to iterate over multiple results simulataneously.  I'd rather 
see an object for each result set and return a tuple of result
sets if there are more than one.
 
> setinputsizes(sizes) 
> 
>      Note: this method is not well-defined yet.  This can be used
>      before a call to execute() to predefine memory areas for the
>      operation's parameters. sizes is specified as a tuple -- one item
>      for each input parameter. The item should be a Type object that
>      corresponds to the input that will be used, or it should be an
>      integer specifying the maximum length of a string parameter. If
>      the item is None, then no predefined memory area will be reserved
>      for that column (this is useful to avoid predefined areas for
>      large inputs).
> 
>      This method would be used before the execute() method is invoked. 
> 
>      Note that this method is optional and is merely provided for
>      higher performance database interaction. Implementations are free
>      to do nothing and users are free to not use it.

See above.

> setoutputsize(size [,col]) 
> 
>      Note: this method is not well-defined yet.  Set a column buffer
>      size for fetches of large columns (e.g. LONG). The column is
>      specified as an index into the result tuple. Using a column of
>      None will set the default size for all large columns in the
>      cursor.
> 
>      This method would be used before the execute() method is invoked.
> 
>      Note that this method is optional and is merely provided for
>      higher performance database interaction. Implementations are free
>      to do nothing and users are free to not use it.

In the case of LONG columns, how is someone suppose to know the maximum 
size ahead of time?  Does anyone really want this?
 
> DBI Helper Objects and Exceptions
> 
> Many databases need to have the input in a particular format for
> binding to an operation's input parameters. For example, if an input
> is destined for a DATE column, then it must be bound to the database
> in a particular string format. Similar problems exist for "Row ID"
> columns or large binary items (e.g. blobs or RAW columns). This
> presents problems for Python since the parameters to the execute()
> method are untyped.

They don't have to be. See above.

> When the database module sees a Python string
> object, it doesn't know if it should be bound as a simple CHAR column,
> as a raw binary item, or as a DATE.
> 
> To overcome this problem, the dbi interface module was created. This
> module, which every database module must provide, specifies some basic
> database interface types for working with databases. There are two
> classes: dbiDate and dbiRaw. These are simple container classes that
> wrap up a value. When passed to the database modules, the module can
> then detect that the input parameter is intended as a DATE or a
> RAW.

I suggest doing away with these through use of parameters like
%r for raw and %t for date time, or whatever.

> For symmetry, the database modules will return DATE and RAW
> columns as instances of these classes.

I'd rather see strings come back for RAW and "Date" objects
come back for dates.  I'd prefer to see the Date type be pluggable.
 
> A Cursor Object's description attribute returns information about each
> of the result columns of a query. The type_code is defined to be equal
> to one of five types exported by this module: STRING, RAW, NUMBER,
> DATE, or ROWID.

There needs to be a distinction between ints and floats.

> Note: The values returned in the description tuple must not
> necessarily be the same as the defined types, i.e. while coltype ==
> STRING will always work, coltype is STRING may fail.

Why?
 
> The module exports the following functions and names: 
> 
> dbiDate(value) 
> 
>      This function constructs a dbiDate instance that holds a date
>      value. The value should be specified as an integer number of
>      seconds since the "epoch" (e.g. time.time()).
> 
> dbiRaw(value) 
> 
>      This function constructs a dbiRaw instance that holds a raw
>      (binary) value. The value should be specified as a Python string.
> 
> STRING 
> 
>      This object is used to describe columns in a database that are
>      string-based (e.g. CHAR).
> 
> RAW 
> 
>      This object is used to describe (large) binary columns in a
>      database (e.g. LONG RAW, blobs).
> 
> NUMBER 
> 
>      This object is used to describe numeric columns in a database.
> 
> DATE 
> 
>      This object is used to describe date columns in a database. 
> 
> ROWID 
> 
>      This object is used to describe the "Row ID" column in a
>      database.
> 
> The module also exports these exceptions that the DB module should
> raise:
> 
> Warning 
> 
>      Exception raised for important warnings like data truncations
>      while inserting, etc.
> 
> Error 
> 
>      Exception that is the base class of all other error
>      exceptions. You can use this to catch all errors with one single
>      'except' statement. Warnings are not considered errors and thus
>      should not use this class as base.
> 
> DataError 
> 
>      Exception raised for errors that are due to problems with the
>      processed data like division by zero, numeric out of range, etc.
> 
> OperationalError 
> 
>      Exception raised when the an unexpected disconnect occurs, the
>      data source name is not found, etc.
> 
> IntegrityError 
> 
>      Exception raised when the relational integrity of the database is
>      affected, e.g. a foreign key check fails.
> 
> InternalError 
> 
>      Exception raised when the database encounters an internal error,
>      e.g. the cursor is not valid anymore, the transaction is out of
>      sync, etc.
> 
> ProgrammingError 
> 
>      Exception raised for programming erros, e.g. table not found or
>      already exists, etc.
> 
> Note: The values of these exceptions are not defined. They should give
> the user a fairly good idea of what went wrong though.

If dbi exports a C API, it should do so through a Python 
CObject.  This should avoid weird linking problems (that
the oracledb and ctsybase modules have.

Jim

--
Jim Fulton           mailto:jim@digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.