Splitting a sequence into pieces with identical elements
python.list at tim.thechases.com
Wed Aug 11 03:18:36 CEST 2010
On 08/10/10 19:37, candide wrote:
> Suppose you have a sequence s , a string for say, for instance this one :
> We want to split s into the following parts :
> ['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']
> ie each part is a single repeated character word.
While I'm not sure it's idiomatic, the overabuse of regexps in
Python certainly seems prevalent enough to be idiomatic ;-)
As such, you can use:
r = re.compile(r'((.)\1*)')
#r = re.compile(r'((\w)\1*)')
s = 'spppammmmegggssss'
results = [m.group(0) for m in r.finditer(s)]
Additionally, you have all the properties of the match-object
(which includes the start/end) available too if you need).
You don't specify what you want to have happen with non-letters
(whitespace, punctuation, etc). The above just treats them like
any other character, finding repeats. If you just want "word"
characters, you can use the 2nd ("\w") version, or adjust
More information about the Python-list