Extracting subsequences composed of the same character
MRAB
python at mrabarnett.plus.com
Thu Mar 31 21:16:47 EDT 2011
On 01/04/2011 01:43, candide wrote:
> Suppose you have a string, for instance
>
> "pyyythhooonnn ---> ++++"
>
> and you search for the subquences composed of the same character, here
> you get :
>
> 'yyy', 'hh', 'ooo', 'nnn', '---', '++++'
>
> It's not difficult to write a Python code that solves the problem, for
> instance :
>
[snip]
>
> I should confess that this code is rather cumbersome so I was looking
> for an alternative. I imagine that a regular expressions approach could
> provide a better method. Does a such code exist ? Note that the string
> is not restricted to the ascii charset.
>>> import re
>>> re.findall(r"((.)\2+)", s)
[('yyy', 'y'), ('hh', 'h'), ('ooo', 'o'), ('nnn', 'n'), ('---', '-'),
('++++', '+')]
>>> [m[0] for m in re.findall(r"((.)\2+)", s)]
['yyy', 'hh', 'ooo', 'nnn', '---', '++++']
More information about the Python-list
mailing list