recombination variations
David Siedband
technique at oceanicsky.com
Tue Nov 30 06:06:45 EST 2004
The problem I'm solving is to take a sequence like 'ATSGS' and make all
the DNA sequences it represents. The A, T, and G are fine but the S
represents C or G. I want to take this input:
[ [ 'A' ] , [ 'T' ] , [ 'C' , 'G' ], [ 'G' ] , [ 'C' , 'G' ] ]
and make the list:
[ 'ATCGC' , 'ATCGG' , 'ATGGC' , 'ATGGG' ]
The code below is what I have so far: 'alphabet' is a dictionary that
designates the set oif base pairs that each letter represents (for
example for S above it gives C and G). I call these ambiguous base
pairs because they could be more then one. Thus the function name
'unambiguate'. It makes a list of sequences with only A T C and Gs and
none of the ambiguous base pair designations.
The function 'unambiguate_bp' takes a sequence and a base pair in it
and returns a set of sequences with that base pair replaced by each of
it's unambiguous possibilities.
The function unambiguate_seq takes a sequence and runs unambiguate_bp
on each base pair in the sequence. Each time it does a base pair it
replaces the set of things it's working on with the output from the
unambiguate_bp. It's a bit confusing. I'd like it to be clearer.
Is there a better way to do this?
--
David Siedband
generation-xml.com
def unambiguate_bp(seq, bp):
seq_set = []
for i in alphabet[seq[bp]]:
seq_set.append(seq[:bp]+i+seq[bp+1:])
return seq_set
def unambiguate_seq(seq):
result = [seq]
for i in range(len(seq)):
result_tmp=[]
for j in result:
result_tmp = result_tmp + unambiguate_bp(j,i)
result = result_tmp
return result
alphabet = {
'A' : ['A'],
'T' : ['T'],
'C' : ['C'],
'G' : ['G'],
'W' : ['A','T'],
'M' : ['A','C'],
'R' : ['A','G'],
'Y' : ['T','C'],
'K' : ['T','G'],
'S' : ['C','G'],
'H' : ['A','T','C'],
'D' : ['A','T','G'],
'V': ['A','G','C'],
'B' : ['C','T','G'],
'N' : ['A','T','C','G']
}
More information about the Python-list
mailing list