[Tutor] efficient method to search between two lists

Terry Carroll carroll at tjc.com
Thu Mar 23 07:00:02 CET 2006


On Wed, 22 Mar 2006, Srinivas Iyyer wrote:

> I cannot think of any other smart method since these
> are the only two ways I know possible. 
> 
> Would any one please help me suggesting a neat and
> efficient way.  

I'm thinking:

1) use sets to get subsets of both lists down to only those elements that 
will match at least one element in another list; 

2) loop and compare, but only on those elements you know will match 
somewhere (because of step #1)

One try:

list_a = ['S83513\tNM_001117', 'X60435\tNM_001117',
'U75370\tNM_005035', 'U05861\tNM_001353',
'S68290\tNM_001353', 'D86864\tNM_145349',
'D86864\tNM_003693', 'D86864\tNM_145351',
'D63483\tNM_145349', 'D63483\tNM_003693',
'D63483\tNM_145351', 'S66427\tNM_002892',
'S57153\tNM_002892']

list_b = ['HC_G110\t1000_at\tS83513',
'HC_G110\t1001_at\tD63483',
'HC_G110\t1002_f_at\tD86864',
'HC_G112\t1003_s_at\tX60435',
'HC_G112\t1004_at\tS57153']

set_a =  set([(x.split('\t'))[0] for x in list_a])
set_b =  set([(x.split('\t'))[2] for x in list_b])
intersection = list(set_a & set_b)

for (whole_b, keypart_b) in \
    [(x, x.split('\t')[2]) for x in list_b
     if x.split('\t')[2] in intersection]:
    
    for (whole_a, keypart_a) in \
        [(x, x.split('\t')[0]) for x in list_a
        if x.split('\t')[0] in intersection]:
        
        if keypart_b == keypart_a:
            print whole_b+whole_a

Gives as output:

HC_G110 1000_at S83513S83513    NM_001117
HC_G110 1001_at D63483D63483    NM_145349
HC_G110 1001_at D63483D63483    NM_003693
HC_G110 1001_at D63483D63483    NM_145351
HC_G110 1002_f_at       D86864D86864    NM_145349
HC_G110 1002_f_at       D86864D86864    NM_003693
HC_G110 1002_f_at       D86864D86864    NM_145351
HC_G112 1003_s_at       X60435X60435    NM_001117
HC_G112 1004_at S57153S57153    NM_002892


(Note, my tab settings are different from yours, hence the different 
output.)



More information about the Tutor mailing list