String Splitter Brain Teaser

Brian van den Broek bvande at
Mon Mar 28 02:09:54 CEST 2005

James Stroud said unto the world upon 2005-03-27 17:39:
> Hello,
> I have strings represented as a combination of an alphabet (AGCT) and a an 
> operator "/", that signifies degeneracy. I want to split these strings into 
> lists of lists, where the degeneracies are members of the same list and 
> non-degenerates are members of single item lists. An example will clarify 
> this:
> gets split to
> [['A'], ['T'], ['T', 'G'], ['A'], ['T'], ['A', 'G']]
> I have written a very ugly function to do this (listed below for the curious), 
> but intuitively I think this should only take a couple of lines for one 
> skilled in regex and/or listcomp. Any takers?
> James
> p.s. Here is the ugly function I wrote:
> def build_consensus(astr):
>   consensus = []       # the lol that will be returned
>   possibilities = []   # one element of consensus
>   consecutives = 0     # keeps track of how many in a row
>   for achar in astr:
>     if (achar == "/"):
>       consecutives = 0
>       continue
>     else:
>       consecutives += 1
>     if (consecutives > 1):
>       consensus.append(possibilities)
>       possibilities = [achar]
>     else:
>       possibilities.append(achar)
>   if possibilities:
>     consensus.append(possibilities)
>   return consensus


in the spirit of "Now I have two problems" I like to avoid r.e. when I 
can. I don't think mine avoids a bit of ugly, but I, at least, find it 
easier to grok (YMMV):

def build_consensus(string):

     result = [[string[0]]]   # starts list with a list of first char
     accumulate = False

     for char in string[1:]:

         if char == '/':
             accumulate = True

             if accumulate:
                 # The pop removes the last list appended, and we use
                 # its single item to build then new list to append.
                 result.append([result.pop()[0], char])
                 accumulate = False


     return result

(Since list.append returns None, this could use
accumulate = result.append([result.pop()[0], char])
in place of the two lines in the if accumulate block, but I don't 
think that is a gain worth paying for.)


Brian vdB

More information about the Python-list mailing list