Splitting a sequence into pieces with identical elements

Tim Chase python.list at tim.thechases.com
Tue Aug 10 21:18:36 EDT 2010


On 08/10/10 19:37, candide wrote:
> Suppose you have a sequence s , a string  for say, for instance this one :
>
> spppammmmegggssss
>
> We want to split s into the following parts :
>
> ['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']
>
> ie each part is a single repeated character word.

While I'm not sure it's idiomatic, the overabuse of regexps in 
Python certainly seems prevalent enough to be idiomatic ;-)

As such, you can use:

   import re
   r = re.compile(r'((.)\1*)')
   #r = re.compile(r'((\w)\1*)')
   s = 'spppammmmegggssss'
   results = [m.group(0) for m in r.finditer(s)]

Additionally, you have all the properties of the match-object 
(which includes the start/end) available too if you need).

You don't specify what you want to have happen with non-letters 
(whitespace, punctuation, etc).  The above just treats them like 
any other character, finding repeats.  If you just want "word" 
characters, you can use the 2nd ("\w") version, or adjust 
accordingly.

-tkc








More information about the Python-list mailing list