please help with optimisation of this code - update of given table according to another table

Farraige farraige at go2.pl
Wed Nov 8 05:18:14 EST 2006


Hi I need your help...
I am implementing the method that updates given table (table is
represented as list of lists of strings) according to other table (some
kind of merging)...

This method takes following arguments:
t1                               - table we would like to update
t2                               - table we would like to take data
from
keyColumns                - list of key indexes e.g. [0,1]
columnsToBeUpdated -  list of column indexes we would like to update in
our table T1 e.g [2,4]

Let's say we have a table T1:

A B C D E
---------------
1 4  5  7 7
3 4  0  0 0

and we call a method mergeTable(T1, T2, [0,1], [2,4])

It means that we would like to update columns C and E of table T1 with
data from table T2 but only in case the key columns A and B are equal
in both tables.... I grant that the given key is unique in both tables
so if I find a row with the same key in table T2 I do merging, stop and
go to next row in table T1...

Let's say T2 looks following:

A B C D E
---------------
2 2  8 8 8
1 4  9 9 9

So after execution of our   mergeTable method, the table T1 should look
like :

A B C D E
1 4  9  7 9
3 4  0 0  0

The 2nd row ['3', '4',  '0' ,'0',  '0'] didn't change because there was
no row in table T2 with key = 3 ,4

The main part of my algorithm now looks something like ...

merge(t1, t2, keyColumns, columnsToBeUpdated)

.......

        for row_t1 in t1:
            for  row_t2 in t2:
                if [row_t1[i] for i in keyColumns] == [row_t2[j] for j
in keyColumns]:
                    # the keys are the same
                    for colName in columnsToBeUpdated:
                        row_t1[colName] = row_t2[colName]

                    # go outside the inner loop - we found a row with
                    # the same key in the table
                    break

In my algorithm I have 2 for loops and I have no idea how to optimise
it (maybe with map? )
I call this method for very large data and the performance is a
critical issue for me :(

I will be grateful for any ideas
Thanks in advance!




More information about the Python-list mailing list