Splitting a sequence into pieces with identical elements

Tim Chase python.list at tim.thechases.com
Wed Aug 11 03:18:36 CEST 2010

On 08/10/10 19:37, candide wrote:
> Suppose you have a sequence s , a string  for say, for instance this one :
> spppammmmegggssss
> We want to split s into the following parts :
> ['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']
> ie each part is a single repeated character word.

While I'm not sure it's idiomatic, the overabuse of regexps in 
Python certainly seems prevalent enough to be idiomatic ;-)

As such, you can use:

   import re
   r = re.compile(r'((.)\1*)')
   #r = re.compile(r'((\w)\1*)')
   s = 'spppammmmegggssss'
   results = [m.group(0) for m in r.finditer(s)]

Additionally, you have all the properties of the match-object 
(which includes the start/end) available too if you need).

You don't specify what you want to have happen with non-letters 
(whitespace, punctuation, etc).  The above just treats them like 
any other character, finding repeats.  If you just want "word" 
characters, you can use the 2nd ("\w") version, or adjust 


