Splitting a sequence into pieces with identical elements
MRAB
python at mrabarnett.plus.com
Tue Aug 10 21:30:49 EDT 2010
Tim Chase wrote:
> On 08/10/10 19:37, candide wrote:
>> Suppose you have a sequence s , a string for say, for instance this
>> one :
>>
>> spppammmmegggssss
>>
>> We want to split s into the following parts :
>>
>> ['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']
>>
>> ie each part is a single repeated character word.
>
> While I'm not sure it's idiomatic, the overabuse of regexps in Python
> certainly seems prevalent enough to be idiomatic ;-)
>
> As such, you can use:
>
> import re
> r = re.compile(r'((.)\1*)')
> #r = re.compile(r'((\w)\1*)')
That should be \2, not \1.
Alternatively:
r = re.compile(r'(.)\1*')
#r = re.compile(r'(\w)\1*')
> s = 'spppammmmegggssss'
> results = [m.group(0) for m in r.finditer(s)]
>
> Additionally, you have all the properties of the match-object (which
> includes the start/end) available too if you need).
>
> You don't specify what you want to have happen with non-letters
> (whitespace, punctuation, etc). The above just treats them like any
> other character, finding repeats. If you just want "word" characters,
> you can use the 2nd ("\w") version, or adjust accordingly.
>
More information about the Python-list
mailing list