Detecting repeated subsequences of identical items
Alain Ketterlin
alain at universite-de-strasbourg.fr.invalid
Thu Apr 21 03:25:16 EDT 2016
Steven D'Aprano <steve at pearwood.info> writes:
> I want to group repeated items in a sequence. For example, I can group
> repeated sequences of a single item at a time using groupby:
[...]
> Now I want to group subsequences. For example, I have:
>
> "ABCABCABCDEABCDEFABCABCABCB"
>
> and I want to group it into repeating subsequences. I can see two ways to
> group it:
>
> ABC ABC ABCDE ABCDE F ABC ABC ABC B
[...]
> or:
>
> ABC ABC ABC D E A B C D E F ABC ABC ABC B
[...]
> How can I do this? Does this problem have a standard name and/or solution?
Looks like a tough one. I don't think it has a name (I'm not even sure
to be able to formally define it). Lets say it is an instance of
"longest repeating substring"
(https://en.wikipedia.org/wiki/Longest_repeated_substring_problem --
which really does not say much).
Anyway, it looks like a job for a suffix trees.
Depending on what you are after, you may also be interested in the
sequitur algorithm (http://www.sequitur.info/).
-- Alain.
More information about the Python-list
mailing list