[Tutor] Need advise on protein sequence alignment code in python 2.7 giving wrong output
Белякова Анастасия
nasbelyakova at gmail.com
Tue Aug 11 10:53:38 CEST 2015
I am trying to develop code to perform global sequence alignment in python
2.7 for proteins with gap penalty of -5 and blossom64 scoring matrix. Code
I have now give wrong output.
When I give 'PLEASANTLY' and 'MEANLY' as input, I get 'PLEASANTLY' and
'M-EAN--L-Y' as output instead of 'PLEASANTLY' and '-MEA--N-LY'. Blossom62
scoring matrix is given as dictionary {'A': {'A': 4, 'C': 0, etc. -5 is
insertion/deletion penalty.
Here is a code:
def scoring(v,w,blossom62):
S = [[0]*(len(w)+1) for _ in xrange(len(v)+1)]
for i in range(0,len(w)+1):
S[0][i]=i*-5
for j in range (0,len(v)+1):
S[j][0]=j*-5
for i in xrange(len(v)):
for j in xrange(len(w)):
S[i+1][j+1] =
max(S[i+1][j]-5,S[i][j+1]-5,S[i][j]+blossom62[v[i]][w[j]] )
backtrack=[[0]*(len(w)) for _ in xrange(len(v))]
for i in xrange(len(v)):
for j in xrange(len(w)):
if max(S[i][j-1],S[i-1][j],S[i-1][j-1]) == S[i-1][j]:
backtrack[i][j]= 0
elif max(S[i][j-1],S[i-1][j],S[i-1][j-1]) == S[i][j-1]:
backtrack[i][j]= 1
elif max(S[i][j-1],S[i-1][j],S[i-1][j-1]) == S[i-1][j-1]:
backtrack[i][j]= 2
def insert_indel(word,pos):
return word[:pos]+'-'+word[pos:]
v_aligned, w_aligned = v, w
i,j =len(v)-1,len(w)-1
while i*j!=0:
if backtrack[i][j]==0:
i-=1
w_aligned=insert_indel(w_aligned,j)
elif backtrack[i][j]==1:
j-=1
v_aligned=insert_indel(v_aligned,i)
elif backtrack[i][j]==2:
j-=1
i-=1
print v_aligned
print w_aligned
What is wrong and how to fix it?
More information about the Tutor
mailing list