Splitting a sequence into pieces with identical elements
Tim Chase
python.list at tim.thechases.com
Tue Aug 10 22:31:09 EDT 2010
On 08/10/10 20:30, MRAB wrote:
> Tim Chase wrote:
>> r = re.compile(r'((.)\1*)')
>> #r = re.compile(r'((\w)\1*)')
>
> That should be \2, not \1.
>
> Alternatively:
>
> r = re.compile(r'(.)\1*')
Doh, I had played with both and mis-transcribed the combination
of them into one malfunctioning regexp. My original trouble with
the 2nd one was that r.findall() (not .finditer) was only
returning the first letter of each because that's what was
matched. Wrapping it in the extra set of parens and using "\2"
returned the actual data in sub-tuples:
>>> s = 'spppammmmegggssss'
>>> import re
>>> r = re.compile(r'(.)\1*')
>>> r.findall(s) # no repeated text, just the initial letter
['s', 'p', 'a', 'm', 'e', 'g', 's']
>>> [m.group(0) for m in r.finditer(s)]
['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']
>>> r = re.compile(r'((.)\2*)')
>>> r.findall(s)
[('s', 's'), ('ppp', 'p'), ('a', 'a'), ('mmmm', 'm'), ('e', 'e'),
('ggg', 'g'), ('ssss', 's')]
>>> [m.group(0) for m in r.finditer(s)]
['s', 'ppp', 'a', 'mmmm', 'e', 'ggg', 'ssss']
By then changing to .finditer() it made them both work the way I
wanted.
Thanks for catching my mistranscription.
-tkc
More information about the Python-list
mailing list