[DB-SIG] SQL PEP for discussion

Stuart Bishop zen@shangri-la.dropbear.id.au
Sun, 17 Jun 2001 23:47:42 +1000 (EST)


I've put togther a PEP for a standard wrapper for DB compliant
drivers. I'd like to get peoples thoughts on it before I submit
it anywhere. In particular, if the SIG thinks this is the
correct approach to take before I do further work on it.

I'd also like to find out if there are similar projects that
have had work started on them, as they may make this work
irrelevant.


PEP: TBA
Title: Standard Wrapper For SQL Databases
Version: $Revision: 1.4 $
Author: zen@shangri-la.dropbear.id.au (Stuart Bishop)
Discussions-To: db-sig@python.org
Status: pre submission Draft
Type: Standards Track
Requires: 234,248,249
Created: 17-Jun-2001
Post-History: Never


Abstract

    Python has had a solid Database API for SQL databases since 1996 [1].
    This API has been intentially kept lean to make it simpler to
    develop and maintain drivers, as a richer feature set could be
    implemented by a higher level wrapper and maintained in
    a single place rather than in every API compliant driver.
    The goal of this PEP is to define and implement such a wrapper,
    and merge it into the standard Python distribution.


Copyright

    This document has been placed in the public domain.


Specification

    Use of this wrapper requires a DB API v2.0 compliant driver to
    be installed in sys.path. The wrapper may support DB API v1.0
    drivers (to be determined).

    Module Interface (sql.py)

	connect(driver,dsn,user,password,host,database)

	    driver -- The name of the DB API compliant driver module.
	    dsn -- Datasource Name
	    user -- username (optional)
	    password - password (optional)
	    host -- hostname or network address (optional)
	    database -- database name (optional)

	    Returns a Connection object

            Does the connect function need to accept arbitrary keyword
            arguments?

	Exceptions (unchanged from DB API 2.0)

	    Exceptions thrown need to be subclasses of those
	    defined in the sql module, to allow calling code to
	    catch them without knowning which driver module
	    they were thrown from. This is particularly of use
	    for code that is connecting to multiple databases
	    using different drivers.


    Connection Object

	close()
	commit()
	rollback()
	cursor()

	    As per DB API 2.0. The rollback method will raise a
	    NotSupportedError exception if the driver doesn't support
	    transactions.

	quote(object)

	    Returns a ANSI SQL quoted version of the given value as a
	    string.  For example:

		>>> print con.quote(42)
		42
		>>> print con.quote("Don't do that!")
		'Don''t do that!'

	    Note that this cannot currently be done for dates, which
	    generally have a RDBMS dependant syntax. This would need
	    to be resolved in the next version of the DB API spec, and
	    is the reason why this is the method of the connection object
	    as opposed to a function. A quote method is invaluable for
	    generating logs of SQL commands or for dynamically generating
	    SQL queries.

	execute(operation,[seq_of_parameters])

	    A Cursor is created and its execute method is called.
	    Returns the newly created cursor.

	    for row in con.execute('select actor,year from sketches'):
		[ ... ]

            Insert note abour cursor creation overheads and how to
	    avoid here.

	    Should cursor handling be hidden? It would be quite possible
	    to have the connection object maintain a pool of cursors which
	    are used to create iterators. The iterator would return the
	    cursor to the pool when iteration is completed, or in its
	    destructor.

	    What should the default format be for bind variables?

	driver_connection()

	    Return the unwrapped connection object as produced by the
	    driver. This may be required for accessing RDBMS specific
	    features.

	capabilities

	    A dictionary describing the capabilities of the driver.
	    Currently defined values are:

		apilevel

		    String constant stating the supported DB API level
		    of the driver. Currently only the strings '1.0' and
		    '2.0' are allowed.

		threadsafety

		    As per DB API 2.0. Irrelevant if we can trasparently
		    enforce good threading.

		rollback

		    1 if the driver supports the rollback() method
		    0 if the driver does not support the rollback() method

		nextset

		    1 if the driver's cursors supports the nextset() method
		    0 if the nextset() method is not supported.


    Cursor Object

	The cursor object becomes an iterator after its execute
	method has been called. Rows are retrieved using the drivers
	fetchmany(arraysize) method.

	execute(operation,sequence_of_parameters)

            As per the executemany method in the DB API 2.0 spec.
	    Is there any need for both execute and executemany in this
	    API?

	    Returns self, so the following code is valid

	    for row in mycursor.execute('select actor,year from sketches'):
		[ ... ]

	    What should the default format be for bind variables?

	callproc(procedure,[parameters])
	arraysize
	description
	rowcount
	close()
	setinputsizes(size)
	setoutputsizes(size[,column]

	    As per DB API 2.0. Perhaps descriptions should be
	    trimmed to the basics of column name & datatype?

	driver_cursor()

	    Return the unwrapped cursor object as produced by the driver.
	    This may be required to access driver specific features.

	next()

	    Return the Row object for the next row from the currently
	    executing SQL statement. As per DB API 2.0 spec, except
	    a StopIteration exception is raised when the result set
	    is exhausted.

	__iter__()

	    Returns self.

	nextset()

	    I guess as per DB API 2.0 spec.


    Row Object

	When a Cursor is iterated over, it returns Row objects.

	[index_or_key]

	    Retrieve a field from the Row. If index_or_key is an integer,
	    the column of the field is referenced by number (with the first
	    column index 0). If index_or_key is a string, the column is
	    referenced by name.

            Note that referencing columns by name may cause problems if you
	    are trying to write platform independant code and should be
	    avoided, as different vendors capitalize their column names
	    differently.


    Type Objects and Constructors

	As per DB API 2.0 spec.

	It would be nice if there was a more intelligent standard Date
	class in the Python core that we could leverage. It is probably
	worth putting this in another PEP that we would depend on.


Rationale

    The module is called sql.py, to avoid any ambiguity with non-realational
    or non-SQL compliant database interfaces. This also nicely limits
    the scope of the project.

    RDBMS abstraction classes, such as Python dictionary -> RDBMS table
    mappings, object persistance, connection pools etc. are better suited
    to other modules (or submodules) and are _currently_ beyond the scope of
    this PEP.

    The core of the API is identical to the Python Database Interface v2.0
    (PEP-249). This API is already familiar to Python programers and
    is a proven solid foundation. To this core I have added some helper
    functions and iterator support (PEP-234).

    Python previously defined a common relational database API that
    was implemented by all drivers, and application programmers accessed
    the drivers directly. This caused the following issues:

	It was difficult to write code that connected to multiple
	database vendor's databases. Each seperate driver used defined
	its own heirarchy of exceptions that needed to be handled, and
	similar problems occurred with identifying column datatypes etc.

	Platform independant code could not be written simply,
	due to the differing paramater styles used by bind variables.
	This also caused problems with publishing example code and tutorials.

	The API remained minimal, as any new features would need to
	be implemented by all driver maintainers. The DB-SIG felt
	most feature suggestions would be better implemented by higher
	level wrappers, such as that defined by this PEP.

	More than a few people didn't realise that Python _had_ a database
	API, as it required people to find the DB-SIG on python.org.

    The wrapper will ease writing RDBMS independant code:

	The wrapper will enforce thread safety. It is unknown at this stage
	if this will be done by passing calls to a single worker thread for
	non thread safe drivers, or by raising an exception if an attempt
	is made to use a driver in a unsafe manner.
	(Can we enforce thread safety? Thread ID's get reused, so it is
	 possible for the following to happen: Thread ID 10 is created
		Thread ID 10 creates a cursor 'c'
		Thread ID stops running
		A new thread created, and happens to have thread ID 10 again.
		The new thread ID 10 attempts to use cursor 'c'
	 Fixing this might require another PEP, and probably is impossible if
	 we allow for threads being spawned from C code.)

	The driver name is passed to the connect method as a string,
	rather than importing the driver module, to allow the driver being
	used by an application to be defined in a configuration file or
	other resource file.

	Bound parameters are specified in a common format, and translated
	to the native format used by the driver.

	Some basic guidelines on writing RDBMS independant code will
	be provided in the documentation (this is not always obvious
	to developers who only work with one vendor's database system).

	Only one heirarchy of exceptions needs to be caught,
	as opposed to one heirarchy per driver being used.


Language Comparison

    The design proposed in this PEP is the same as that used by Java,
    where a relational database API is shipped as part of the core language,
    but requires the installation of 3rd party drivers to be used.

    Perl has a similar arrangement of API and driver separation. Perl
    does not ship with PerlDBI or drivers as part of the core language,
    but it is well documented and trivial to add using the CPAN tools.

    PHP has a seperate API for each database vendor, although work is
    underway (completed?) to define a common abstraction layer
    similar to Java, Perl and Python. All of the drivers ship as
    part of the core language.


Gadfly

    Gadfly is a RDBMS written in Python by (insert name of author when I'm
    online). If Gadfly was also included in the Python distribution, it would
    be extremly useful as a learning tool. It would also allow the
    creation of products that require a RDBMS back end, yet work with
    any standard Python installation. Gadfly has already been shipped
    as part of Digital Creation's Zope for similar reasons. Shipping
    a fully functional RDBMS with Python would also be useful for the
    nyer-nyer factor in language advocacy.

    Should Gadfly be included in the core Python distribution as a
    default RDBMS?

    Hmm... this is beyond the scope of the abstract and should be a
    seperate PEP (esp as it isn't dependant on this PEP). I'll leave
    it here for now as a discussion point.


References

    [1] PEP-248 and PEP-249


-- 
Stuart Bishop <zen@shangri-la.dropbear.id.au>