[Tutor] Finding the "streaks" in heads/tails list

Thu Oct 2 00:00:05 CEST 2008

> Regular expressions are for processing strings, not loops.

>From a theoretical point of view, this isn't quite true: regular
expressions can deal with sequences of things.  It's true that most
regular expression libraries know how to deal only with characters,
but that's a matter of specializing the library for efficiency, and
not a general property of regexes.

But what regular expressions (i.e. finite-state automata) can't do
very well is count with memory, and the task you're asking for is
fundamentally an anti-regexp one.

> I would loop through the list with a for loop, keeping track of the
> last value seen and the current count. If the current value is the
> same as the last, increment the count; if it is different, reset the
> count.

Agreed.  This seems direct.

If we want to be cute, we can also use the itertools.groupby()
function to do the clumping of identical sequential values for us.
For example:

#################################################
>>> for group in itertools.groupby('aaaabbbbcaaabaaaacc'):
...     print group[0], len(list(group[1]))
...
a 4
b 4
c 1
a 3
b 1
a 4
c 2
#################################################

See the standard library documentation for more details on itertools.groupby():

    http://www.python.org/doc/lib/itertools-functions.html