[DB-SIG] Our first contribution: a row class

Tue, 12 Feb 2002 16:39:34 -0500 (EST)

Here is my first code contribution and discussion-piece.  It can be
downloaded from:

  http://opensource.theopalgroup.com/

I've thrown together a very simple starting page, that will hopefully turn
into a fairly nice hub of Python activity.

----------------------------------------------------------------------------
Here is a description from the db_row.py module:

This module defines light-weight objects suitable for many applications,
though the primary goal of the implementer is for storage of database query
results.  The primary design criteria for the data-structure where:

  1) store a sequence of arbitrary Python objects
  2) the number of items stored in each instance should be constant
  3) each instance must be as light-weight as possible, since many thousands
     of them could be created
  4) values must be retrievable by index
     e.g.: d[3]
  5) values must be retrievable by a field name using the Python attribute
     syntax: e.g.: d.field
  6) values must be retrievable by a field name using the Python item
     syntax: e.g.: d['field']
  7) optionally, operations using field names should be case-insensitive
     e.g.: d['FiElD']
  8) should support standard list and dictionary -like interfaces, including
     slicing
  9) should be convertible to a list, tuple or dictionary

These criteria are chosen to simplify access to rows that are returned from
database queries.  Lets say that you run this query:

  cursor.execute('SELECT a,b,c FROM blah;')
  results = cursor.fetchall()

The resulting data-structure is typically a list if row tuples. e.g.:

  results = [ (1,2,3), (3,4,5), (6,7,8) ]

While entirely functional, these data types only allow integer indexed
access.  e.g., to query the 'b' attribute of the second row:

  b = results[1][1]

This requires that all query results are accessed by index, which can be
very tedious and the code using this technique tends to be hard to maintain.
The alternative has always been to return a list of native Python
dictionaries, one for each row.  e.g.:

  results = [ {'a':1,'b':2,'c':3}, {'a':3,'b':4,'c':5},
              {'a':6,'b':7,'c':8} ]

This has the advantage of easier access to attributes by name, e.g.:

  b = results[1]['b']

There are several serious disadvantages:

  1) each row requires a heavy-weight dictionary _per instance_.  This can
     damage performance when returning, say, 100,000 rows from a query.

  2) access by index is lost since Python dictionaries are unordered.

  3) attribute-access syntax is somewhat sub-optimal (or at least
     inflexible) since it must use the item-access syntax.

     i.e., x['a'] vs. x.a.

The second and third problems can be addressed by creating a subclass of
UserDict (a Python class that looks and acts like a dictionary), though that
only magnifies the performance problems.

HOWEVER, there are some new features in Python 2.2 that can provide the best
of all possible worlds.  Here is an example:

  # Create a new class type to store the results from our query
  # (we'll make field names case-insensitive just to show off)
  > R=make_row_class(['a','b','c'], insensitive = 1)

  # Create an instance of our new tuple class with values 1,2,3
  > r=R( (1,2,3) )

  # Demonstrate all three accessor types
  > print r['a'], r[1], r.c
  1 2 3

  # Demonstrate all case-insensitive operation
  > print r['a'], r['A']
  1 1

  # Return the keys (column names)
  > print r.keys()
  ('a', 'b', 'c')

  # Return the values
  > print r.values()
  [1, 2, 3]

  # Return a list of keys and values
  > print r.items()
  [('a', 1), ('b', 2), ('c', 3)]

  # Return a dictionary of the keys and values
  > print r.dict()
  {'a': 1, 'c': 3, 'b': 2}

  # Demonstrate slicing behavior
  > print r[1:3]
  [2, 3]

This solution uses some new Python 2.2 features and ends up allocating only
one dictionary _per row class__, not per row instance.  i.e., the row
instances do not allocate a dictionary at all!  This is accomplished using
the new-style object 'slots' mechanism.

Here is how you could use these objects:

  cursor.execute('SELECT a,b,c FROM blah;')

  # Build the field list from the field names returned by the query
  fields = [ d[0] for d in cursor.description ]

  # Make a class to store the resulting rows
  R = make_row_class(fields, insensitive = 1)

  # Build the rows from the row class and each tuple returned from the cursor
  results = [ R(row) for row in cursor.fetchall() ]

  print results[1].b, results[2].B, results[3]['b'], results[2][1]

Open implementation issues:

  o This implementation will likely break when Python 2.3 comes out, since
    super will become a keyword and possibly due to other syntactic changes.
    The code will be trivial to fix, so this is not a big concern.

  o Values are currently mutable.  This opens the door to several problems:

     1) .items(), .values() and .keys() do not skip slots that do not have
        values assigned.  This is so that the field indices will always be
        consistent.  Instead, they will represent missing values with 'None'
        objects.  e.g.:

          > R=make_row_class(['a','b','c'], insensitive = 1)
          > r=R([1,2,3])
          > print r.items()
          [('a', 1), ('b', 2), ('c', 3)]
          > del r[:]
          > print r.items()
          [('a', None), ('b', None), ('c', None)]

     2) Row equality and hashing are open issues.  I do not intend to
        compare rows or store them in dictionaries, so this does not bother
        me much.  Others may want to, so maybe it is desirable to have a
        mutable and immutable instance types.

   o The current code is a little schitzophrenic about returning lists and
     tuples.  Some operations return one, some the other.  We should pick
     one way and stick to it.

   o More doc-strings are needed, including dynamic row class doc-strings

   o Add integrated unit-tests (a la doctest, most likely)

   o Maybe some better example code

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com