finding homopolymers in both directions
Peter Otten
__peter__ at web.de
Tue Aug 3 14:31:35 EDT 2010
Lee Sander wrote:
> Hi,
> Suppose I have a string such as this
> 'aabccccccefggggghiiijkr'
>
> I would like to print out all the positions that are flanked by a run
> of symbols.
> So for example, I would like to the output for the above input as
> follows:
>
> 2 b 1 aa
> 2 b -1 cccccc
> 10 e -1 cccccc
> 11 f 1 ggggg
> 17 h 1 iii
> 17 h -1 ggggg
>
> where the first column is the position of interest, the next column is
> the entry at that position,
> 1 if the following column refers to a runs that come after and -1 if
> the runs come before
Trying to follow your spec I came up with
from itertools import groupby
from collections import namedtuple
Item = namedtuple("Item", "pos key size")
def compact(seq):
pos = 0
for key, group in groupby(seq):
size = len(list(group))
yield Item(pos, key, size)
pos += size
def window(items):
items = iter(items)
prev = None
cur = next(items)
for nxt in items:
yield prev, cur, nxt
prev = cur
cur = nxt
yield prev, cur, None
items = compact("aabccccccefggggghiiijkr")
for prev, cur, nxt in window(items):
if cur.size == 1:
if prev is not None:
if prev.size > 1:
print cur.pos, cur.key, -1, prev.key*prev.size
if nxt is not None:
if nxt.size > 1:
print cur.pos, cur.key, 1, nxt.key*nxt.size
However, this gives a slightly differenct output:
$ python homopolymers.py
2 b -1 aa
2 b 1 cccccc
9 e -1 cccccc
10 f 1 ggggg
16 h -1 ggggg
16 h 1 iii
20 j -1 iii
Peter
More information about the Python-list
mailing list