[Tutor] efficient method to search between two lists

Srinivas Iyyer srini_iyyer_bio at yahoo.com
Thu Mar 23 04:09:01 CET 2006


Dear group, 
I have a question for solving a problem in more
simplistic and efficient way. 

I have two lists:

list_a = ['S83513\tNM_001117', 'X60435\tNM_001117',
'U75370\tNM_005035', 'U05861\tNM_001353',
'S68290\tNM_001353', 'D86864\tNM_145349',
'D86864\tNM_003693', 'D86864\tNM_145351',
'D63483\tNM_145349', 'D63483\tNM_003693',
'D63483\tNM_145351', 'S66427\tNM_002892',
'S57153\tNM_002892']


list_b = ['HC_G110\t1000_at\tS83513',
'HC_G110\t1001_at\tD63483',
'HC_G110\t1002_f_at\tD86864',
'HC_G112\t1003_s_at\tX60435',
'HC_G112\t1004_at\tS57153']

>>> for x in list_b:
...     cola = x.split('\t')[2]
...     for m in list_a:
...             colb = m.split('\t')[0]
...             if cola == colb:
...                     print x+'\t'+m
...
HC_G110 1000_at S83513  S83513  NM_001117
HC_G110 1001_at D63483  D63483  NM_145349
HC_G110 1001_at D63483  D63483  NM_003693
HC_G110 1001_at D63483  D63483  NM_145351
HC_G110 1002_f_at       D86864  D86864  NM_145349
HC_G110 1002_f_at       D86864  D86864  NM_003693
HC_G110 1002_f_at       D86864  D86864  NM_145351
HC_G112 1003_s_at       X60435  X60435  NM_001117
HC_G112 1004_at S57153  S57153  NM_002892

method b:
for m in list_b:
        cols = m.split('\t')[2]
        for x in list_a:
                if x.startswith(cols):
                        print m+'\t'+x


HC_G110 1000_at S83513  S83513  NM_001117
HC_G110 1001_at D63483  D63483  NM_145349
HC_G110 1001_at D63483  D63483  NM_003693
HC_G110 1001_at D63483  D63483  NM_145351
HC_G110 1002_f_at       D86864  D86864  NM_145349
HC_G110 1002_f_at       D86864  D86864  NM_003693
HC_G110 1002_f_at       D86864  D86864  NM_145351
HC_G112 1003_s_at       X60435  X60435  NM_001117
HC_G112 1004_at S57153  S57153  NM_002892

Problem:
# of elements in list_a = 246230
# of elements in list_b = 213612

This is more brute force and hihghly time consuming
method. 

Although dictionary is superfast, due to duplications
in both columns of list_a, a dictionary option falls
out. 

I cannot think of any other smart method since these
are the only two ways I know possible. 

Would any one please help me suggesting a neat and
efficient way.  

thanks in advance. 

Srini

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the Tutor mailing list