# recombination variations

David Siedband technique at oceanicsky.com
Tue Nov 30 12:06:45 CET 2004

```The problem I'm solving is to take a sequence like 'ATSGS' and make all
the DNA sequences it represents.  The A, T, and G are fine but the S
represents C or G.  I want to take this input:

[ [ 'A' ] , [ 'T' ] , [ 'C' , 'G' ], [ 'G' ] , [ 'C' , 'G' ] ]

and make the list:

[ 'ATCGC' , 'ATCGG' , 'ATGGC' , 'ATGGG' ]

The code below is what I have so far:  'alphabet' is a dictionary that
designates the set oif base pairs that each letter represents (for
example for S above it gives C and G).  I call these ambiguous base
pairs because they could be more then one.  Thus the function name
'unambiguate'.  It makes a list of sequences with only A T C and Gs and
none of the ambiguous base pair designations.

The function 'unambiguate_bp' takes a sequence and a base pair in it
and returns a set of sequences with that base pair replaced by each of
it's unambiguous possibilities.

The function unambiguate_seq takes a sequence and runs unambiguate_bp
on each base pair in the sequence.  Each time it does a base pair it
replaces the set of things it's working on with the output from the
unambiguate_bp.  It's a bit confusing.  I'd like it to be clearer.

Is there a better way to do this?
--
David Siedband
generation-xml.com

def unambiguate_bp(seq, bp):
seq_set = []
for i in alphabet[seq[bp]]:
seq_set.append(seq[:bp]+i+seq[bp+1:])
return seq_set

def unambiguate_seq(seq):
result = [seq]
for i in range(len(seq)):
result_tmp=[]
for j in result:
result_tmp = result_tmp + unambiguate_bp(j,i)
result = result_tmp
return result

alphabet = {
'A' : ['A'],
'T' : ['T'],
'C' : ['C'],
'G' : ['G'],
'W' : ['A','T'],
'M' : ['A','C'],
'R' : ['A','G'],
'Y' : ['T','C'],
'K' : ['T','G'],
'S' : ['C','G'],
'H' : ['A','T','C'],
'D' : ['A','T','G'],
'V': ['A','G','C'],
'B' : ['C','T','G'],
'N' : ['A','T','C','G']
}

```