[DB-SIG] First suggestion for db-api 3.0

Mon Sep 29 13:00:47 CEST 2008

On 2008-09-25 00:12, Vernon Cole wrote:
> On Wed, Sep 24, 2008 at 2:36 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> 
>>  [...] There has been a lot of discussion about these
>> parameter styles. The last round concluded that we should strip down
>> the number of possible styles to 1 or 2 ('?' and ':1' IIRC).
>>
> 
> I think it was qmark and named. I agree with dropping the others. -- VC
> 
> [...] Also note that the spec does indeed allow the module to return
>> namedtuples, but it doesn't require this. The spec deliberately
>> uses the term "sequence" instead of tuple or list.
>>
>> Note that the implementation of namedtuples is not a particularly
>> nice one (they use dynamic programming). There's also a different
>> implementation available via the C API called structseq (this is used
>> e.g. by the time module).
>>
>> Both create subclasses of the standard Python tuple and are
>> sequences, so both are permitted to be used by the DB-API to
>> represent rows.
>>
> 
> So a dbapi 3.0 implementation could be a proper superset of dbapi 2.0. That
> would be best. -- VC
> 
> 
>> However, using the Python version in an API that potentially
>> returns a few thousand rows is not exactly what I'd recommend
>> to do, since it will slow down the interface a lot.
>>
>> Due to the nature of the namedtuples, you will also see many
>> different objects as row object (each query will return a
>> different object type).
>>
>> My favorite database (the one I help implement once upon a time) had the
> data organized as a virtual array of similar (same class) objects with each
> column as a named attribute. (Those are not the terms we used, since object
> oriented languages had not been invented then.) That's the model I am used
> to thinking in. Our implementation was pretty fast.
> 
> Adodbapi is already quite slow, so I doubt if the extra overhead of a named
> tuple would actually make that much difference to it. Perhaps I should
> download a 2.6 python and muck with it. Those of you working in C would be
> emulating the action of a named tuple for performance reasons, I expect,
> using your own objects. The implementation details would be unimportant,
> provided that the syntax is consistent.
> 
>   I only know that we are already requiring the user to know two languages:
> Python and SQL, in order to use dbapi. Selecting from among the existing ORM
> tool kits, and then learning the tool kit, is more of a learning curve than
> I could hack. I guess that's why I still don't use an Object Relational
> Mapper.
> 
>  I just want named access to my columns somehow -- so I don't add the
> complexity (and inevitable errors) of having to remember the column number
> of each field. The access to each column should feel like native Python

Please see the FAQ entry on this topic for arguments why this feature
didn't make it into the DB-API as requirement.

For some backends it may very well work out and those could use the
namedtuple approach or any other similar method such as providing
mapping like access to the row entries. For others, it may well
not work, e.g. if the database backend doesn't provide the result
set column name information at all or only using generated names
such as "column1", "column2", etc.

I usually use tuple unpacking to get the rows mapped back into
Python and have so far not had any major problems with it, e.g.

for row in cursor.fetchall():
    id, name, property1, property2, data = row

This solves multiple problems you can run into otherwise:

 * columns with generated names (such as aggregate function calls
   without "AS" renaming) are supported

 * column name casing as made available by the database doesn't
   matter

 * freedom to choose different names in Python than those used
   in the database

 * ability to address columns which do not have Python identifier
   compatible names (e.g. non-ASCII ones or names with embedded
   spaces)

 * local access to variable data (which is fast)

Other advantages:

 * tuple unpacking is a fast operation in Python (it's a native
   byte code)

 * by sticking to tuples, you can use pickle to send the data
   over the wire or store it in a file; if you use simple types,
   you can even use marshal instead of pickle

 * "automatic" detection of a mismatch between the number of
   result set columns sent by the database and the number expected
   by the Python program

The namedtuple access would only provide more flexibility if I were
to only use a subset of the result set columns... but then, I wouldn't
retrieve them from the database in the first place.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 29 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611