String Splitter Brain Teaser
Bill Mill
bill.mill at gmail.com
Mon Mar 28 12:40:51 EST 2005
On Mon, 28 Mar 2005 09:18:38 -0800, Michael Spencer
<mahs at telcopartners.com> wrote:
> Bill Mill wrote:
>
> > for very long genomes he might want a generator:
> >
> > def xgen(s):
> > l = len(s) - 1
> > e = enumerate(s)
> > for i,c in e:
> > if i < l and s[i+1] == '/':
> > e.next()
> > i2, c2 = e.next()
> > yield [c, c2]
> > else:
> > yield [c]
> >
> >
> >>>>for g in xgen('ATT/GATA/G'): print g
> >
> > ...
> > ['A']
> > ['T']
> > ['T', 'G']
> > ['A']
> > ['T']
> > ['A', 'G']
> >
> > Peace
> > Bill Mill
> > bill.mill at gmail.com
>
> works according to the original spec, but there are a couple of issues:
>
> 1. the output is specified to be a list, so delaying the creation of the list
> isn't a win
True. However, if it is a really long genome, he's not going to want
to have both a string of the genome and a list of the genome in
memory. Instead, I thought it might be useful to iterate through the
genome so that it doesn't have to be stored in memory. Since he didn't
specify what he wants the list for, it's possible that he just needs
to iterate through the genome, grouping degeneracies as he goes.
>
> 2. this version fails down in the presence of "double degeneracies" (if that's
> what they should be called) - which were not in the OP spec, but which cropped
> up in a later post :
> >>> list(xgen("AGC/C/TGA/T"))
> [['A'], ['G'], ['C', 'C'], ['/'], ['T'], ['G'], ['A', 'T']]
This is simple enough to fix, in basically the same way your function
works. I think it actually makes the function simpler:
def xgen(s):
e = enumerate(s)
stack = [e.next()[1]] #push the first char into the stack
for i,c in e:
if c != '/':
yield stack
stack = [c]
else:
stack.append(e.next()[1])
yield stack
>>> gn
'ATT/GATA/G/AT'
>>> for g in xgen(gn): print g
...
['A']
['T']
['T', 'G']
['A']
['T']
['A', 'G', 'A']
['T']
Peace
Bill Mill
bill.mill at gmail.com
More information about the Python-list
mailing list