[DB-SIG] Controlling return types, again

Sat May 19 12:13:59 CEST 2007

Hello Carsten,

> The important discussion on controlling return types has gone cold
> again, so I'd like to revive it. Revision 2 of my type mapping proposal
> was met with deafening silence except for valuable input by Jim
> Patterson, which I have incorporated into Revision 3. The result is
> available for your perusal once again at
> http://www.uniqsys.com/~carsten/typemap.html .
> 
> I don't know whether the general silence indicates tacit agreement or if
> people are too busy to respond or even just to read my proposal in the
> first place. I'd appreciate some feedback to see how close we are to
> reaching consensus, even if it's just a "show of hands" in the form of
> +1/0/-1 responses.

I did read your proposal and was about to reply, but then got
"side" tracked again by other things.

In general, I think that we shouldn't expose/impose such a
low-level interface for type mapping. These details should be
left to the module author and the user shouldn't have bother
with them.

Instead, we should try to find an API with simple methods like
e.g. .setinputconverter(), .setoutputconverter() to let the user
define new mappings.

The module can then use an implementation like the one you
described internally.

What I like about the proposal is that it doesn't try to
overgeneralize things (a very common habit in Python design).

A few comments on the bullets in your proposal and a sketch
of a slightly modified solution:

    *  Connection and cursor objects will have an attribute called outputmap that maps objects from the database to the
application and an attribute called inputmap that maps objects from the application to the database. Note that both
mappings are between Python objects. The Python objects on the database side are mapped from and to actual database
values by the canonical mapping.

See above. I think we should not expose these low level mapping
tables to the user.

Note that the database will typically have different ways of
defining the "column type code". Since we already expose this
type code in the cursor.description field, we should probably
use that as basis.

In any case, the type codes will be database specific. It won't
be possible to generically say: map integers to Python floats,
since "integers" may refer to a whole set of database types for
some backends or only to one type for others.

    * The default mappings are None for efficiency, which means that only the canonical mapping is in effect. The same
can be achieved by an empty dictionary, but it's faster to check for None that to check for an empty dictionary.

That's implementation detail and should not be exposed.

We could add .getinputconverter() and .getoutputconverter()
to query the currently active mappings in some way.

    * When a cursor is created, it inherits shallow copies of the connection's mappings. This allows changing the type
mapping per cursor without affecting the connection-wide default.

+1

    * When a value is fetched from the database, if the value is not None, its column type (as it would be indicated in
cursor.description) is looked up in outputmap, and the resulting callable object is called upon to convert the fetched
value, as illustrated by this pseudo-code:

          converter = cursor.outputmap.get(dbtype, None)
          if converter is not None:
             fetched_value = converter(fetched_value)

There's a problem here: since fetching the database value from
the database will usually involve some more or less complicated
C binding code, you can't just pass the fetched_value to a
converter since this would mean that you already have a Python
object representing the value.

Normally, a database interface will have a set of different
techniques by which a value is fetched from the database, e.g.
fetch a number value as integer, long, string, float, decimal.

To make full use of converters, we'll have to be able to tell
the database module: fetch this number type as e.g. decimal
using the internal fetch mechanisms (phase 1) and then call
this converter on the resulting value (phase 2).

Hope I'm clear enough on this. If not, please let me know.

    * The mappings need not be actual mappings. They may be any object that implements __getitem__ and copy. This allows
specifying "batch" mappings that map many different types with the same callable object in a convenient fashion.

That again is an implementation detail. We should factor such
a type collection feature into the above methods.

My favorite would be to not set the converters per type and then
have a mapping, but to instead just have one function for
implementing phase 1 which then returns a converter function
to the database module to implement phase 2, e.g.

def mydecimalformat(value):
    return '%5.2f' % value

def outputconverter(cursor, position):
    dbtype = cursor.description[position][1]
    if dbtype == SQL.DECIMAL:
        # Fetch decimals as floats and call mydecimalformat on these
        return (SQL.DECIMAL, mydecimalformat)

mxODBC has a converter function for defining phase 1
output conversions and it works nicely. It doesn't have a
phase 2 implementation. Instead, it provides several attributes
for tuning the internal fetch mechanisms.

    * For convenience, the module.connect() and connection.cursor() methods should accept outputmap and inputmap keyword
arguments that allow the application to specify non-default mappings at connection/cursor creation time.

Not sure about this: the type mapping setup should be explicitly done
after a proper connect. It may have to rely on the connection already
being established.

    * In discussions on the db-sig mailing list, some concern was raised that naming the directions of conversion as
input and output is ambiguous because input could mean into the database or into the application. However, PEP 249
already uses input and output in the naming of setinputsizes and setoutputsizes, and this proposal follows the same
semantics.

Right, let's use names similar to those. "input" is always the direction
from Python to the database (database gets input) and "output" from the
database to Python (get receives output).

    * When input binding is performed and the cursor's inputmap is not None, a converter function is looked up in the
inputmap according to the following pseudo-code:

          for tp in type(in_param).__mro__:
            converter = cursor.inputmap.get(tp, None)
            if converter is not None: break
          if converter is not None:
            in_param = converter(in_param)

This will cause a serious performance hit since you have to do
this for every single value fetched from the database. Just think
of a result set 20 columns and 10000 rows.

You'd have to run through the above for-loop 200000 times, even
though it's really only needed once per column (since the types
won't change within the result set).

The above two-phase approach avoids this.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 19 2007)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611