Extracting subsequences composed of the same character
Roy Smith
roy at panix.com
Thu Mar 31 21:40:38 EDT 2011
In article <4d952008$0$3943$426a74cc at news.free.fr>,
candide <candide at free.invalid> wrote:
> Suppose you have a string, for instance
>
> "pyyythhooonnn ---> ++++"
>
> and you search for the subquences composed of the same character, here
> you get :
>
> 'yyy', 'hh', 'ooo', 'nnn', '---', '++++'
I got the following. It's O(n) (with the minor exception that the string
addition isn't, but that's trivial to fix, and in practice, the bunches
are short enough it hardly matters).
#!/usr/bin/env python
s = "pyyythhooonnn ---> ++++"
answer = ['yyy', 'hh', 'ooo', 'nnn', '---', '++++']
last = None
bunches = []
bunch = ''
for c in s:
if c == last:
bunch += c
else:
if bunch:
bunches.append(bunch)
bunch = c
last = c
bunches.append(bunch)
multiples = [bunch for bunch in bunches if len(bunch) > 1]
print multiples
assert(multiples == answer)
[eagerly awaiting a PEP for collections.bunch and
collections.frozenbunch]
More information about the Python-list
mailing list