[Tutor] overview of how data is handled [getting dictionaries from sql cursor / generators] (fwd)

Thu May 20 05:46:19 EDT 2004

Danny Yoo wrote:

>[Forwarding to tutor at python.org: please keep tutor at python.org in CC.  It
>gives other folks the opportunity to help answer your questions.]
>  
>
<snipped forwarded message, answering to reply

>my code
>mylist=[]
>myrowdata=[]
>for count in range(0,len(myfields)):
>   mylist.append(myfields[count][0])
>
>for count in range(0,len(mydata[0])):
>    myrowdata.append(mydata[0][count] )
>
>d = dict(zip(mylist,myrowdata))
>The above works with only one record at a time.
>
>I'm using PyPgSQL and the resultset is a list.
>mydata=cursor.fetchmany(10)
> Because the of books I'm reading (O'Rielly) I was expecting a tuple but
>I got a list.   OK but now the routines in the books suggest that I use
>a dict.  So I figured out how to convert the data (mydata in this case)
>into a dict.  But then I realized that the two loops I was going to use
>(to convert the list into a dict) will take time.  I did not test how
>long.  But I also realized that converting has to be wrong.  The
>returned data must be used as a list - not converted  and then used.
>But  it appears that's what you are doing in the "getDictsFromCursor"  -
>converting into a dict ( I know to show me).
>
>  
>
It's probably a good idea to stress again that the (list of) objects 
returned
by cursor.fetchone() (cursor.fetchmany())  behave like a dict. It is 
neither
a tuple, nor a list or dict. (It's a PgResultSet object). However, in the
context of what you're trying to do, these object are read-only. That 
is, you
can't change them, and even if you transform them into a list (or dict) and
change the values, there is no way these values are going to be 
automagically
updated in the the database.

You will need additional code for that (see below).

>In the java code you are looping thru the resultset.  That's all I want
>to do - at first.  Then I'll need some way to loop thru the data based
>on the field name.  Again it looks like I have to create the dict for
>each record.  This just can't be right.  Fetchone 1000's of times is
>also got to be wrong.   In the VFP world we have a "SCAN" that loops
>thru the resultset.  I can access the data by field name in VFP. How
>does one do same thing - loop thru the resultset (could be 1000's) and
>access the fields by their field name?
>
>  
>
If the backend of VFP is an sql database (I wouldn't know) you can 
assume that
VFP is doing something like what I show below under the covers.

>One nice thing about getting a list is I can change the data in the list
>(can't in a tuple).  so how about something like:
>replace all extPrice with Qty * price (this is a very simple thing to do
>in VFP,  Delphi or Java.) in the resultset. Then open a connection and
>save the changed data.
>  
>
You're not going to be able to do this, even after you have changed the 
result
set from as 'tuple' into a list, to make it editable. The reason that 
this is
not trivial, is that the result set does not know from which row in the
database it came, so it's not obvious which row in the database has to be
updated once you have changed a row in the result set. And of course, 
after you
have changed an item from the resultset into a list, the list will not know
anything whatsoever about any database, making this problem even more
difficult.

>What I think has to happen at the moment is each record must be
>converted to a dict and then after I can change each record based on the
>key of the dict.  This does not sound right - I'm doing something wrong.
>John
>
>
>  
>
>
The following code is untested and uses a lot of different python
concepts that you may not have encountered yet. But I hope this gives
an idea about the kind of work that you have to do to make this work
the way you want.

class DBRow:
    """An editable row that remembers from which row in the table it came"""
    def __init__(self, oid, cols, data):
        self.oid = oid      # row identifier
        self.cols = cols    # column names (a list)
        self.data = data    # row data (a list)
        self.dirty = 0      # only save when updated

    def __getitem__(self, key):
        # used to retrieve a value for one column in this row
        index = self.cols.index(key)
        return self.data[index]

    def __setitem__(self, key, value):
        # used to set a value for one columns in this row
        index = self.cols.index(key)
        self.data[index] = value
        self.dirty = 1

    def save(self, table_name, cursor):
        if not self.dirty:
            return

        terms = []
        for col, val in zip(self.cols, self.data)
             terms.append("%s=%s" % (col, val))

        cmnd = "UPDATE %s SET %s WHERE oid=%s"  % (table_name, terms, 
self.oid)
        cursor.execute(cmnd)

class DBTable:
    """Represent the table as a list of editable DBRows"""
    def __init__(self, db, table_name, clause=''):
        self.table_name = table_name
        self.db = db
        self.rows = []

        # get the data from the db
        cmnd = "SELECT oid, * FROM %s" % table_name
        if clause:
            cmnd = cmnd+" WHERE %s" % clause
        c = db.cursor()
        c.execute(cmnd)

        # find names, except oid
        colnames = [d[0] for d in c.description[1:]]

        # transform the resultset in a list of DBRow objects
        for row in c.fetchall():
            oid = row[0]
            data = row[1:]
            self.rows.append(DBRow(oid, colnames, data))

    def __getitem__(self, index):
        return self.rows[index]

    def save(self):
        c = self.db.cursor()
        for r in self.rows:
            r.save(self.table_name, c)
        c.commit()

With this you should be able to do something like:

 >>> db = PgSQL.connect(...)
 >>> data = DBTable(db, "mytable", "aaa>42")
 >>> for row in data:
...     row['aaa'] = row['bbb']+3*row['ccc']
 >>> data.save()

which would change column aaa for all rows where aaa>42

Note that

db = PgSQL.connect(...)
c = db.cursor()
c.execute("UPDATE mytable SET aaa=bbb+3*ccc WHERE aaa>42")
c.commit()

Is by far the fastest (if not easiest) way to solve this particular
problem.

Hope this helps,

Roeland