Need Advice With Arrays and Calculating Eigenvectors
As a newcomer to python, NumPy, and SciPy I need to learn how to most efficiently manipulate data. Today's need is reducing a list of tuples to three smaller lists of tuples, creating a symmetrical matrix from each list, and calculating the Eigenvector of each of the three symmetrical matrices. The starting list contains 9 tuples. Each tuple has 30 items: a category name, a subcategory name, and 28 floats. These were selected from a database, and a tuple looks like this one: (u'soc', u'pro', 1.3196923076923075, 3.8109999999999999, 1.6943846153846154, 2.7393076923076922, 3.825538461538462, 5.0640769230769234, 3.609923076923077, 3.1429999999999998, 1.5936153846153849, 1.4893846153846153, 2.6563076923076929, 2.2156923076923074, 3.7973076923076921, 2.6884615384615387, 2.7008461538461543, 3.4992307692307687, 2.3813846153846154, 3.2199230769230769, 1.7726923076923078, 2.9855384615384613, 2.8829230769230771, 3.7862307692307695, 2.3791538461538462, 4.0949230769230773, 2.8703846153846153, 2.8296923076923073, 3.319230769230769, 1.8083076923076922) The three 'soc' subcategories need to have each of the 28 floats averaged and assigned to another tuple for 'soc'; same with the other two categories. That produces a list of three tuples, each with 29 items. Each of these 28 floats represents the average of a pair-wise comparison of values (in the non-numeric sense). So the first float above represents the cell (1,2), the second float represents the value of the cell (1,3) and so on. The diagonal of the matrix is 1. When I have these three symmetrical matrices, I want to call eigen() on each one to calculate the principal Eignevector. I can think of indirect ways of doing all this, but I'm sure that there are much more efficient approaches known to those who have done this before. So, I'd like your suggestions and recommendations. Of course, if I've not clearly explained my needs, please ask. Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Thu, 22 Feb 2007, Rich Shepard wrote:
As a newcomer to python, NumPy, and SciPy I need to learn how to most efficiently manipulate data. Today's need is reducing a list of tuples to three smaller lists of tuples, creating a symmetrical matrix from each list, and calculating the Eigenvector of each of the three symmetrical matrices.
I have a function that does part of the above. I know that it's highly crude, inefficient, and not taking advantage of python functional coding features such as introspection. That's because I'm not yet sure how to code it better. First, a python message when I invoke the application. In this module, I have 'from scipi import *' per Travis' book. What I see as the application loads is: Overwriting info=<function info at 0xb60e2614> from scipy.misc (was <function info at 0xb60b5d84> from numpy.lib.utils) This doesn't seem to harm anything, but perhaps it needs fixing. Second, here's the function (followed by the output of the print statements): def weightcalc(): # First: average for each position by category stmt1 = """select cat, pos, avg(pr1), avg(pr2), avg(pr3), avg(pr4), avg(pr5), avg(pr6), avg(pr7), avg(pr8), avg(pr9), avg(pr10), avg(pr11), avg(pr12), avg(pr13), avg(pr14), avg(pr15), avg(pr16), avg(pr17), avg(pr18), avg(pr19), avg(pr20), avg(pr21), avg(pr22), avg(pr23), avg(pr24), avg(pr25), avg(pr26), avg(pr27), avg(pr28) from voting group by cat, pos""" appData.cur.execute(stmt1) prefbar = appData.cur.fetchall() # Now, average for all positions within each category ec = [] en = [] ep = [] nc = [] nn = [] np = [] sc = [] sn = [] sp = [] catEco = [] catNat = [] catSoc = [] diag = identity(8, dtype=float) for item in prefbar: if item[0] == 'eco' and item[1] == 'con': ec.append(item[2:]) if item[0] == 'eco' and item[1] == 'neu': en.append(item[2:]) if item[0] == 'eco' and item[1] == 'pro': ep.append(item[2:]) if item[0] == 'nat' and item[1] == 'con': nc.append(item[2:]) if item[0] == 'nat' and item[1] == 'neu': nn.append(item[2:]) if item[0] == 'nat' and item[1] == 'pro': np.append(item[2:]) if item[0] == 'soc' and item[1] == 'con': sc.append(item[2:]) if item[0] == 'soc' and item[1] == 'neu': sn.append(item[2:]) if item[0] == 'soc' and item[1] == 'pro': sp.append(item[2:]) # three lists, each of three tuples. Need to be converted to arrays and averaged. catEco.append(ec + en + ep) print catEco, '\n' catNat.append(nc + nn + np) print catNat, '\n' catSoc.append(sc + sn + sp) print catSoc and here is the output of catEco: [[(2.4884848484848487, 3.3123939393939401, 3.144090909090909, 2.5676060606060607, 3.2095151515151517, 3.4157878787878788, 2.5132727272727275, 2.7514242424242425, 2.9628787878787879, 2.446939393939394, 2.7069393939393938, 3.1676666666666669, 2.8530303030303035, 2.6058484848484853, 3.0955454545454546, 2.6283939393939395, 2.4350606060606061, 3.2610303030303034, 2.3926969696969698, 2.4951212121212123, 2.5276666666666676, 2.668848484848485, 3.4265757575757578, 2.9714545454545456, 2.8431818181818187, 3.0674545454545461, 2.8712727272727272, 2.1262424242424243), (2.0477142857142856, 1.0064285714285715, 3.1869285714285711, 3.5895000000000001, 3.9467142857142861, 3.2696428571428569, 2.9104285714285716, 2.5850714285714282, 4.8555714285714293, 3.3554999999999997, 2.3430714285714282, 3.5795714285714282, 1.3627857142857143, 0.83778571428571436, 2.4744999999999999, 2.8067142857142855, 3.143642857142857, 2.4637857142857138, 3.7382142857142857, 3.2875000000000001, 2.1167857142857143, 3.5459285714285715, 3.5667142857142857, 3.1280714285714284, 3.580428571428572, 1.0882857142857143, 3.0217142857142858, 3.8292857142857142), (2.3360769230769227, 2.0547692307692311, 2.8591538461538457, 2.4986923076923073, 2.809769230769231, 2.2041538461538464, 3.9557692307692309, 3.1109230769230769, 1.8777692307692309, 1.6783846153846156, 2.4337692307692307, 1.8520769230769232, 4.0975384615384618, 3.3513846153846147, 1.9008461538461536, 2.9993846153846158, 1.8076923076923079, 2.6881538461538463, 2.453615384615385, 3.5579999999999998, 1.2396153846153848, 3.8225384615384614, 2.8304615384615386, 2.6258461538461537, 2.2387692307692308, 3.381615384615384, 2.8569999999999998, 2.9676153846153848)]] Where I am stuck is making catEco (and the other two lists) NumPy arrays, and calculating the average of the three values in the same position within each tuple. Also, I need no more than 2 decimal places for each value, but I don't know where to place a format specifier. Please suggest how to both improve the function's structure and produce a ndarray[] that is the average tuple values in each list. Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Sun, 25 Feb 2007, Rich Shepard wrote:
I have a function that does part of the above. I know that it's highly crude, inefficient, and not taking advantage of python functional coding features such as introspection. That's because I'm not yet sure how to code it better.
Would still appreciate suggestions for tightening it up. The latest version is this: def weightcalc(): # First: average for each position by category meanvotes = [] stmt1 = """select cat, pos, avg(pr1), avg(pr2), avg(pr3), avg(pr4), avg(pr5), avg(pr6), avg(pr7), avg(pr8), avg(pr9), avg(pr10), avg(pr11), avg(pr12), avg(pr13), avg(pr14), avg(pr15), avg(pr16), avg(pr17), avg(pr18), avg(pr19), avg(pr20), avg(pr21), avg(pr22), avg(pr23), avg(pr24), avg(pr25), avg(pr26), avg(pr27), avg(pr28) from voting group by cat, pos""" appData.cur.execute(stmt1) prefbar = appData.cur.fetchall() # print prefbar # Now, average for all positions within each category ec = [] en = [] ep = [] nc = [] nn = [] np = [] sc = [] sn = [] sp = [] catEco = [] catNat = [] catSoc = [] diag = identity(8, dtype=float) for item in prefbar: if item[0] == 'eco' and item[1] == 'con': ec.append(item[2:]) if item[0] == 'eco' and item[1] == 'neu': en.append(item[2:]) if item[0] == 'eco' and item[1] == 'pro': ep.append(item[2:]) if item[0] == 'nat' and item[1] == 'con': nc.append(item[2:]) if item[0] == 'nat' and item[1] == 'neu': nn.append(item[2:]) if item[0] == 'nat' and item[1] == 'pro': np.append(item[2:]) if item[0] == 'soc' and item[1] == 'con': sc.append(item[2:]) if item[0] == 'soc' and item[1] == 'neu': sn.append(item[2:]) if item[0] == 'soc' and item[1] == 'pro': sp.append(item[2:]) # three lists, each of three tuples. Need to be converted to arrays and averaged. catEco.append(ec + en + ep) catNat.append(nc + nn + np) catSoc.append(sc + sn + sp) # here are the numpy arrays aEco = array(catEco, dtype=float) aNat = array(catNat, dtype=float) aSoc = array(catSoc, dtype=float) # here are the numpy arrays of averages barEco = average(aEco, axis=1) barNat = average(aNat, axis=1) barSoc = average(aSoc, axis=1) Got all this worked out by reading the book and trial-and-error. Next step is to convert each of barEco, barNat, and barSoc into symmetrical matrices with unit diagonals. Each of these arrays holds the values to the right of the diagonal in the symmetrical matrices; the matching cells to the left of the diagonal are (1.0 / right cell value). Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
Hi, here's your code as I would have written it. :-) My comments are those starting with four #'s. As a side remark, usually one uses four spaces for intendation. I hope I did not mix it up. An alternative approach to what I did below would be to map the keys ('eco', 'nat', 'soc') to integers and use lists instead of dicts. Johannes def weightcalc(): # First: average for each position by category meanvotes = [] #### Do you use this? #### not much can be done about this stmt1 = """select cat, pos, avg(pr1), avg(pr2), avg(pr3), avg(pr4), avg(pr5), avg(pr6), avg(pr7), avg(pr8), avg(pr9), avg(pr10), avg(pr11), avg(pr12), avg(pr13), avg(pr14), avg(pr15), avg(pr16), avg(pr17), avg(pr18), avg(pr19), avg(pr20), avg(pr21), avg(pr22), avg(pr23), avg(pr24), avg(pr25), avg(pr26), avg(pr27), avg(pr28) from voting group by cat, pos""" appData.cur.execute(stmt1) prefbar = appData.cur.fetchall() # print prefbar #### using a dict saves us from creating all the lists data = {} for item in prefbar: #### create dict entries on demand if item[0] not in data: data[item[0]] = {} if item[1] not in data[item[0]]: data[item[0]][item[1]] = [] #### append to list data[item[0]][item[1]].append(item[2:]) catarrays = {} averages = {} for key in ['eco', 'nat', 'soc']: catarrays[key] = [] for subkey in ['con', 'neu', 'pro'] #### btw. I don't understand why you throw con, neu, pro in one list #### now after sorting them out in advance. try: catarrays[key].append(data[key][subkey]) except KeyError: #### data[key][subkey] was not set print 'No data for %s,%s'%(key, subkey) #### convert to array catarrays[key] = array(catarrays[key]) #### average averages[key] = average(catarrays(key), axis=1)
On Mon, 26 Feb 2007, Johannes Loehnert wrote:
here's your code as I would have written it. :-) My comments are those starting with four #'s.
Thank you, Johannes.
As a side remark, usually one uses four spaces for intendation. I hope I did not mix it up.
Yes, I know that's the standard for python code contributed to projects. I've used two space tabs for indenting C code for a couple of decades now, and I find it easier to read. Since this code is being used by us, we format for our convenience.
An alternative approach to what I did below would be to map the keys ('eco', 'nat', 'soc') to integers and use lists instead of dicts.
That can be done for this function. But, if dicts work, that's OK, too.
def weightcalc(): # First: average for each position by category meanvotes = [] #### Do you use this?
No, I meant to take that out, and have.
#### not much can be done about this stmt1 = """select cat, pos, avg(pr1), avg(pr2), avg(pr3), avg(pr4), avg(pr5), avg(pr6), avg(pr7), avg(pr8), avg(pr9), avg(pr10), avg(pr11), avg(pr12), avg(pr13), avg(pr14), avg(pr15), avg(pr16), avg(pr17), avg(pr18), avg(pr19), avg(pr20), avg(pr21), avg(pr22), avg(pr23), avg(pr24), avg(pr25), avg(pr26), avg(pr27), avg(pr28) from voting group by cat, pos""" appData.cur.execute(stmt1) prefbar = appData.cur.fetchall() # print prefbar
This is the source of the data: a SQLite3 database table.
#### using a dict saves us from creating all the lists data = {}
for item in prefbar: #### create dict entries on demand if item[0] not in data: data[item[0]] = {} if item[1] not in data[item[0]]: data[item[0]][item[1]] = []
#### append to list data[item[0]][item[1]].append(item[2:])
The above seems to do the opposite of what I need. 'prefbar' is a list of tuples, and the first two items of each tuple are strings. I want to remove those strings and have only the reals. Doesn't the above just copy prefbar to data?
catarrays = {} averages = {} for key in ['eco', 'nat', 'soc']: catarrays[key] = [] for subkey in ['con', 'neu', 'pro'] #### btw. I don't understand why you throw con, neu, pro in one list #### now after sorting them out in advance.
Let me try to explain. I have 9 sets of data as records in the database. The sets are eco/con, eco/neu, eco/pro, nat/con, nat/neu, nat/pro, soc/con, soc/neu, and soc/pro. First, I need to average the 28 items in each of those 9 sets. Second, I need to average the three average values for each of the 28 items within the main sets of eco, nat, and soc. Results can be skewed if only an single, overall average of the 28 items is calculated in a single step.
try: catarrays[key].append(data[key][subkey]) except KeyError: #### data[key][subkey] was not set print 'No data for %s,%s'%(key, subkey) #### convert to array catarrays[key] = array(catarrays[key]) #### average averages[key] = average(catarrays(key), axis=1)
This seems to be taking the averages in one step. We need them to be in two steps. Am I mis-reading this? Thanks, Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
Hi,
#### using a dict saves us from creating all the lists data = {}
for item in prefbar: #### create dict entries on demand if item[0] not in data: data[item[0]] = {} if item[1] not in data[item[0]]: data[item[0]][item[1]] = []
#### append to list data[item[0]][item[1]].append(item[2:])
The above seems to do the opposite of what I need. 'prefbar' is a list of tuples, and the first two items of each tuple are strings. I want to remove those strings and have only the reals. Doesn't the above just copy prefbar to data?
No. data is a dict containing the three main keys (eco, nat, soc). Each maps to a dict containing the subkeys (con, neu, pro). Each of those maps to a list of data. E.g. for item = [('eco', 'con', 1, 2, 3), ('eco', 'con', 4,5,6), ('eco', 'neu', 7,8,9), ('nat', 'neu', 10,11,12)] the result would be data == {'eco': {'con': [(1,2,3), (4,5,6)], 'neu': [(7,8,9)]}, 'nat': {'neu': [(10,11,12)]}}. So with data['eco']['con'] you get back [(1,2,3), (4,5,6)].
catarrays = {} averages = {} for key in ['eco', 'nat', 'soc']: catarrays[key] = [] for subkey in ['con', 'neu', 'pro'] #### btw. I don't understand why you throw con, neu, pro in one list #### now after sorting them out in advance.
Let me try to explain. I have 9 sets of data as records in the database. The sets are eco/con, eco/neu, eco/pro, nat/con, nat/neu, nat/pro, soc/con, soc/neu, and soc/pro.
So you get only one data row for each category/subcategory combination? Or can there be multiple?
First, I need to average the 28 items in each of those 9 sets. Second, I need to average the three average values for each of the 28 items within the main sets of eco, nat, and soc.
what do you mean by item? A float number?
Results can be skewed if only an single, overall average of the 28 items is calculated in a single step.
try: catarrays[key].append(data[key][subkey]) except KeyError: #### data[key][subkey] was not set print 'No data for %s,%s'%(key, subkey) #### convert to array catarrays[key] = array(catarrays[key]) #### average averages[key] = average(catarrays(key), axis=1)
This seems to be taking the averages in one step. We need them to be in two steps. Am I mis-reading this?
Well, it ought to do exactly what your code did. averages['eco'] == barEco IIANM. Johannes
On Tue, 27 Feb 2007, Johannes Loehnert wrote:
No. data is a dict containing the three main keys (eco, nat, soc). Each maps to a dict containing the subkeys (con, neu, pro). Each of those maps to a list of data. E.g. for
item = [('eco', 'con', 1, 2, 3), ('eco', 'con', 4,5,6), ('eco', 'neu', 7,8,9), ('nat', 'neu', 10,11,12)]
the result would be
data == {'eco': {'con': [(1,2,3), (4,5,6)], 'neu': [(7,8,9)]}, 'nat': {'neu': [(10,11,12)]}}.
So with data['eco']['con'] you get back [(1,2,3), (4,5,6)].
A-ha! Now I see it. Thanks very much, Johannes.
So you get only one data row for each category/subcategory combination? Or can there be multiple?
Multiple rows for each category/subcategory.
First, I need to average the 28 items in each of those 9 sets. Second, I need to average the three average values for each of the 28 items within the main sets of eco, nat, and soc.
what do you mean by item? A float number?
Yes. The floats are avereaged by subcategory, then the subcategories are averaged by category.
Well, it ought to do exactly what your code did. averages['eco'] == barEco IIANM.
I see now how to write the code so it works. I mis-understood the first message. Again, thank you very much. Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
participants (2)
-
Johannes Loehnert -
Rich Shepard