Extracting subsequences composed of the same character

Terry Reedy tjreedy at udel.edu
Fri Apr 1 00:18:56 EDT 2011


On 3/31/2011 10:20 PM, Tim Chase wrote:
> On 03/31/2011 07:43 PM, candide wrote:
>> "pyyythhooonnn ---> ++++"
>>
>> and you search for the subquences composed of the same character, here
>> you get :
>>
>> 'yyy', 'hh', 'ooo', 'nnn', '---', '++++'
>
> Or, if you want to do it with itertools instead of the "re" module:
>
>  >>> s = "pyyythhooonnn ---> ++++"
>  >>> from itertools import groupby
>  >>> [c*length for c, length in ((k, len(list(g))) for k, g in
> groupby(s)) if length > 1]
> ['yyy', 'hh', 'ooo', 'nnn', '---', '++++']

Slightly shorter:
[r for r in (''.join(g) for k, g in groupby(s)) if len(r) > 1]

-- 
Terry Jan Reedy




More information about the Python-list mailing list