[DB-SIG] DB API 3.0 strawman

Stuart Bishop zen@shangri-la.dropbear.id.au
Mon, 6 Aug 2001 02:19:09 +1000 (EST)


Here is an update to the DB API update I posted to this
list a few weeks ago. It has moved its focus away from
being a wrapper for 1.0 and 2.0 compliant drivers,
and more into a new DB API standard (although this wrapper
functionality is still there).

I'm after feedback, particularly if this is the direction to
head and if it should proceed to an official PEP.


PEP: TBA
Title: DB API 3.0
Version: $Revision: 1.9 $
Author: zen@shangri-la.dropbear.id.au (Stuart Bishop)
Discussions-To: db-sig@python.org
Status: strawman
Type: Standards Track
Requires: 234,248,249, Date PEP
Created: 18-May-2001
Post-History: Never


Abstract

    Python has had a solid Database API for SQL databases since 1996 [1].
    This revision introduces iterator support and attempts to fix
    known issues with the interface. This version provides a module
    to be integrated into the core Python distribution, which provides
    the interface to 3rd party RDBMS drivers. This module can use
    DB API 1.0 and 2.0 compliant modules as the drivers, although it
    it expected that these drivers will be updated to support this
    API more fully.


Copyright

    This document has been placed in the public domain, except for
    those sections attributed to other sources.


Specification

    Use of this wrapper requires a DB API v2.0 compliant driver to
    be installed in sys.path. The wrapper may support DB API v1.0
    drivers (to be determined).

    Module Interface (sql.py)

	connect(driver,dsn,user,password,host,database)

	    driver -- The name of the DB API compliant driver module.
	    dsn -- Datasource Name
	    user -- username (optional)
	    password - password (optional)
	    host -- hostname or network address (optional)
	    database -- database name (optional)

	    Returns a Connection object. This will be a wrapper object
	    for DB API 1.0 or DB API 2.0 compliant drivers. DB API 3.0
	    compliant drivers will have a Connection instance returned
	    unmodified.

            Does the connect function need to accept arbitrary keyword
            arguments?

	Exceptions (unchanged from DB API 2.0)

	    Exceptions thrown need to be subclasses of
	    those defined in the sql module, to allow calling code
	    to catch them without knowning which driver module
	    they were thrown from. This is particularly of use
	    for code that is connecting to multiple databases
	    using different drivers.

    Connection Object

	close()
	commit()
	cursor()

	    As per DB API 2.0.

	rollback(savepoint=None)

	    Rollback the current transaction entirely, or to the given
	    savepoint if a savepoint is passed. The savepoint object
	    must have been returned by the savepoint method during the
	    current tranaction.

	    The rollback method will raise a NotSupportedError exception
	    if the driver doesn't support transactions.

	quote(object)

	    Returns a ANSI SQL quoted version of the given value as a
	    string.  For example:

		>>> print con.quote(42)
		42
		>>> print con.quote("Don't do that!")
		'Don''t do that!'

	    Note that quoted dates often have a RDBMS dependant syntax
	    (eg. "TO_DATE('01-Jan-2001 12:01','DD-MMM-YYYY H24:MI')" for
	    Oracle). I need to track down the official ANSI SQL 1992
	    or 1999 compliant syntax for specifying date/time/and datetime
	    datatype as strings (if it exists).

	    The quote method is invaluable for generating logs of SQL
	    commands or for dynamically generating SQL queries.

	    This method may be made a module level function as opposed
	    to a Connection method if it can be show that string quoting
	    will always be RDBMS independant.

	execute(operation,[seq_of_parameters])

	    A Cursor is created and its execute method is called.
	    Returns the newly created cursor.

	    for row in con.execute('select actor,year from sketches'):
		[ ... ]

            Insert note abour cursor creation overheads and how to
	    avoid here.

	    What should the default format be for bind variables?

	driver_connection()

	    Returns the unwrapped connection object if the driver
	    is a DB API 1.0 or DB API 2.0 compliant driver. This iis
	    required for accessing RDBMS specific features.
	    Returns self if the connection object is unwrapped.

	capabilities

	    A dictionary describing the capabilities of the driver.
	    Currently defined values are:

		apilevel

		    String constant stating the supported DB API level
		    of the driver. Currently only the strings '1.0',
		    '2.0' and '3.0' are allowed.

		threadsafety

		    As per DB API 2.0.

		rollback

		    1 if the driver supports the rollback() method
		    0 if the driver does not support the rollback() method

		nextset

		    1 if the driver's cursors supports the nextset() method
		    0 if the nextset() method is not supported.

		default_transaction_level

		    The default transaction isolation level for this
		    driver.

	set_transaction_level(level)

	    Set the transaction isolation level to the given level,
	    or a more restrictive isolation level. A NotSupported
	    exception is raised if the given or more restrictive
	    isolation level cannot be set.

	    Returns the transaction isolation level actually set.

	autocommit(flag)

	    Do we want an autocommit method? It would need to default
	    to false for backwards compatibility, and remembering
	    to turn autocommit on is as easy as explicitly calling
	    commit().

	savepoint()

	    Sets a savepoint in the current transaction, or throws a
	    NotSupportedError exception. Returns an object which may
	    be passed to the rollback method to rollback the transaction
	    to this point.

    Cursor Object

	The cursor object becomes an iterator after its execute
	method has been called. Rows are retrieved using the drivers
	fetchmany(arraysize) method.

	execute(operation,sequence_of_parameters)

            As per the executemany method in the DB API 2.0 spec.
	    Is there any need for both execute and executemany in this
	    API? Setting arraysize to 1 will call the drivers fetch() method
	    rather than fetchmany().

	    Returns self, so the following code is valid

	    for row in mycursor.execute('select actor,year from sketches'):
		[ ... ]

	    What should the default format be for bind variables?

	callproc(procedure,[parameters])
	arraysize
	description
	rowcount
	close()
	setinputsizes(size)
	setoutputsizes(size[,column]

	    As per DB API 2.0

	driver_cursor()

	    Return the unwrapped cursor object as produced by the driver.
	    This may be required to access driver specific features.

	next()

	    Return the Row object for the next row from the currently
	    executing SQL statement. As per DB API 2.0 spec, except
	    a StopIteration exception is raised when the result set
	    is exhausted.

	__iter__()

	    Returns self.

	nextset()

	    I guess as per DB API 2.0 spec.

	warnings

	    List of Warning exceptions generated so far by the currently
	    executing statement. This list is cleared automatically by the
	    execute and callproc methods.

	clear_warnings()

	    Erase the list of warnings. Same as 'del cursor.warnings[:]'
	    Is this method required?

	raise_warnings

	    If set to 0, Warning exceptions are not raised and instead
	    only appended to the warnings list.

	    If set to 1, Warning excepions are raised as they occur.

	    Defaults to 1

    Row Object

	When a Cursor is iterated over, it returns Row objects.
	dtuple.py (http://www.lyra.org/greg/python/) provides such an
	implementation already.

	[index_or_key]

	    Retrieve a field from the Row. If index_or_key is an integer,
	    the column of the field is referenced by number (with the first
	    column index 0). If index_or_key is a string, the column is
	    referenced by its lowercased name (lowercased to avoid problems
	    with the differing methods vendors use to capitalize their column
	    names).

    Type Objects and Constructors

	As per DB API 2.0 spec.

	Do we need Date, Time and DateTime classes, or just DateTime?

	Need to author the 'Date PEP'. It would be nice if there was a
	more intelligent standard Date class in the Python core that
	we could leverage. If we don't, people will start using the
	sql module for the more intelligent DateTime object it would
	need to provide even if they arn't using databases.

    ConnectionPool Object

	A connection pool, to be documented.

    Transaction Isolation Levels

	Insert description of dirty reads, non repeatable reads and
	phantom reads, probably stolen from JDBC 3.0 documentation.

	The following symbolic constants are defined to describe the
	four transaction isolation levels defined by SQL99. Note that
	they are in numeric order so comparison operators can be
	safely used to test for restrictivness. Note that these definitions
	are taken from the JDBC API 3.0 documentation by Sun.

	TRANSACTION_NONE

	    No transaction isolation is guarenteed.

	TRANSACTION_READ_UNCOMMITTED

	    Allows transactions to see uncommitted changes to the data.
	    This means that dirty reads, nonrepeatable reads, and
	    phantom reads are possible.

	TRANSACTION_READ_COMMITTED

	    Means that any changes made inside a transaction are not visible
	    outside the transaction until the transaction is committed. This
	    prevents dirty reads, but nonrepeatable reads and phantom
	    reads are still possible.

	TRANSACTION_REPEATABLE_READ

	    Disallows dirty reads and nonrepeatable reads. Phantom read are
	    still possible.

	TRANSACTION_SERIALIZABLE

	    Specifies that dirty reads, nonrepeatable reads, and phantom
	    reads are prevented.

    Test Suite

	A common test suite will be part of the implementation, to
	allow driver authors and driver evaluators to test and excercise
	the systems. Two possbilities to start with are the test suites
	in psycopg by Federico Di Gregorio and mxODBC by eGenix.com Software.
	Example DDL for various systems will need to be provided.


Rationale

    The module is called sql.py, to avoid any ambiguity with non-realational
    or non-SQL compliant database interfaces. This also nicely limits
    the scope of the project. Other suggestions are 'sqlutil' and 'rdbms',
    since 'sql' may refer to the language itself.

    Previous versions of the DB API have been intentially kept lean to
    make it simpler to develop and maintain drivers, as a richer feature
    set could be implemented by a higher level wrapper and maintained in
    a single place rather than in every API compliant driver. As this
    revision provides a single place to maintain code, these features
    can now be provided without placing a burden to the driver authors.

    Existing DB API 1.0 and 2.0 drivers can be used to power the backend
    of this module. This means that there will be a full compliment of
    drivers available from day 1 of this modules release without placing
    a burden on driver developers and maintainers.

    The core of the API is identical to the Python Database API v2.0
    (PEP-249). This API is already familiar to Python programers and
    is a proven solid foundation.

    Python previously defined a common relational database API that
    was implemented by all drivers, and application programmers accessed
    the drivers directly. This caused the following issues:

	It was difficult to write code that connected to multiple
	database vendor's databases. Each seperate driver used defined
	its own heirarchy of exceptions that needed to be handled,
	their own datetime class and their own set of datatype constants.

	Platform independant code could not be written simply,
	due to the differing paramater styles used by bind variables.
	This also caused problems with publishing example code and tutorials.

	The API remained minimal, as any new features would need to
	be implemented by all driver maintainers. The DB-SIG felt
	most feature suggestions would be better implemented by higher
	level wrappers, such as that defined by this PEP.


Language Comparison

    The design proposed in this PEP is the same as that used by Java,
    where a relational database API is shipped as part of the core language,
    but requires the installation of 3rd party drivers to be used.

    Perl has a similar arrangement of API and driver separation. Perl
    does not ship with PerlDBI or drivers as part of the core language,
    but it is well documented and trivial to add using the CPAN tools.

    PHP has a seperate API for each database vendor, although work is
    underway (completed?) to define a common abstraction layer
    similar to Java, Perl and Python. All of the drivers ship as
    part of the core language.


References

    [1] PEP-248 and PEP-249


-- 
Stuart Bishop <zen@shangri-la.dropbear.id.au>