Detecting repeated subsequences of identical items
Steven D'Aprano
steve at pearwood.info
Thu Apr 21 08:15:16 EDT 2016
On Thu, 21 Apr 2016 06:53 pm, Oscar Benjamin wrote:
> On 21 April 2016 at 04:07, Steven D'Aprano <steve at pearwood.info> wrote:
>> I want to group repeated items in a sequence. For example, I can group
>> repeated sequences of a single item at a time using groupby:
>>
>>
>> from itertools import groupby
>> for key, group in groupby("AAAABBCDDEEEFFFF"):
>> group = list(group)
>> print(key, "count =", len(group))
>>
>>
>> outputs:
>>
>> A count = 4
>> B count = 2
>> C count = 1
>> D count = 2
>> E count = 3
>> F count = 4
>>
>>
>> Now I want to group subsequences. For example, I have:
>>
>> "ABCABCABCDEABCDEFABCABCABCB"
>>
>> and I want to group it into repeating subsequences. I can see two ways to
>> group it:
>>
>> ABC ABC ABCDE ABCDE F ABC ABC ABC B
>
> There are some algorithms (helpfully shown in Python) here:
>
> https://en.wikipedia.org/wiki/Cycle_detection
It's not necessarily a cycle though. Consider a sequence of function calls:
def f(x):
return g(x)
def g(x):
if x < 7:
return h(x)
elif x < 50:
return g(x//2)
else:
return x+f(x-1)
def h(x):
raise ValueError # oops, a bug
and a function call f(54). That will result in the chain of calls:
f(54) -> g(54) -> f(53) -> g(53) -> f(52) -> g(52) -> f(51) -> g(51) ->
f(50) -> g(50) -> g(25) -> g(12) -> g(6) -> h(6) raises
So you have that almost-cycle f calls g calls f, but it isn't periodic
because you break out of it. I'd still like to detect the repeated f->g
calls.
--
Steven
More information about the Python-list
mailing list