Loop from 'aaaa' to 'tttt' ?

Alex Martelli aleax at aleax.it
Tue Jun 17 05:58:05 EDT 2003


Des Small wrote:

> Lars Schaps <tuffi23 at gmx.de> writes:
> 
>> Hello.
>> 
>> In my program in need a loop from 'aaaa' over
>> 'aaac', 'aaag', 'aaat', 'aaca' to 'tttt'.
>> (Possible characters 'a', 'c', 'g' and 't')
>> 
>> One idea i had is to take a number n with the base of
>> 4 and use
>> 
>> t= string.translate( '0123', 'acgt')
>> string.translate( n, t)
>> 
>> But i don't know how to convert from base10 to base4.
>> 
>> Has anyone a idea?
> 
> 
> I like to use recursive generators for this sort of thing:
> 
> from __future__ import generators # Not needed in python 2.3
> 
> def base_iter():
>     for a in 'acgt': yield a
> 
> def nbase_iter(n):
>     if n==0: yield ''
>     else:
>         for first in base_iter():
>             for rest in nbase_iter(n-1):
>                 yield first + rest

I think a recursive generator is the most elegant and practical
idea (even though I might disagree with some implementation
details of your specific generator).  Still, "counting in base
N" is also quite acceptable -- and it should not need any base
conversion, since all you're doing is always "increment by 1"
(that's what "counting" stands for here).  E.g.:

def countInBase(base, ndigits):
    digits = ndigits * [0]
    while 1:
        curdig = ndigits - 1
        while curdig >= 0 and digits[curdig] >= base-1:
            digits[curdig] = 0
            curdig -= 1
        if curdig < 0:
            break
        digits[curdig] += 1
        yield digits

You may want to yield tuple(digits) or digits[:] if you need to
be able to call list(countInBase(x, y)) or the like, but this
is suitable for a normal for loop.  Of course, this yields items
which are lists of integers, not strings.  But, the structure IS
what the OP asked for -- we just need some small adaptation,
which can be done in several ways, to give the OP just what he
wants... for example:

def codonStrings(lenghtOfStrings, codons='acgt'):
    base = len(codons)-1
    result = lengthOfStrings * [codons[0]]
    while 1:
        curindex = lenghtOfStrings - 1
        while curindex >= 0 and result[curindex] >= base:
            result[curindex] = 0
            curindex -= 1
        if curindex < 0:
            break
        result[curindex] += 1
        yield ''.join( [codons[i] for i in result] )

As a matter of style, the outer loop's body might be recoded...:

        for curindex in range(lengthOfStrings-1,-1,-1):
            if result[curindex] < base: 
                result[curindex] += 1
                break
            result[curindex] = 0
        else:
            break

and the computation of "range(lengthOfStrings-1,-1,-1)" could,
and no doubt should, be hoisted out of the loop, giving:

def codonStrings(lenghtOfStrings, codons='acgt'):
    base = len(codons)-1
    indexSequence = range(lengthOfStrings-1,-1,-1)
    result = lengthOfStrings * [codons[0]]
    while 1:
        for curindex in indexSequence:
            if result[curindex] < base: 
                result[curindex] += 1
                break
            result[curindex] = 0
        else:
            break
        yield ''.join( [codons[i] for i in result] )

but I guess it's a matter of personal taste whether this
structure is perceived as clearer or less clear than the
original one with the two nested while loops.


Alex





More information about the Python-list mailing list