Detecting repeated subsequences of identical items
Steven D'Aprano
steve at pearwood.info
Wed Apr 20 23:07:36 EDT 2016
I want to group repeated items in a sequence. For example, I can group
repeated sequences of a single item at a time using groupby:
from itertools import groupby
for key, group in groupby("AAAABBCDDEEEFFFF"):
group = list(group)
print(key, "count =", len(group))
outputs:
A count = 4
B count = 2
C count = 1
D count = 2
E count = 3
F count = 4
Now I want to group subsequences. For example, I have:
"ABCABCABCDEABCDEFABCABCABCB"
and I want to group it into repeating subsequences. I can see two ways to
group it:
ABC ABC ABCDE ABCDE F ABC ABC ABC B
giving counts:
(ABC) count = 2
(ABCDE) count = 2
F count = 1
(ABC) count = 3
B repeats 1 time
or:
ABC ABC ABC D E A B C D E F ABC ABC ABC B
giving counts:
(ABC) count = 3
D count = 1
E count = 1
A count = 1
B count = 1
C count = 1
D count = 1
E count = 1
F count = 1
(ABC) count = 3
B count = 1
How can I do this? Does this problem have a standard name and/or solution?
--
Steven
More information about the Python-list
mailing list