[Python-ideas] proposed sequence method: index_subseq()
Tal Einat
taleinat at gmail.com
Tue Aug 27 17:27:43 CEST 2013
On Tue, Aug 27, 2013 at 5:57 PM, Jess Austin <jess.austin at gmail.com> wrote:
> Recently I've repeatedly needed to check whether a particular sequence
> occurred as a "subsequence" of another. I think this could be a general
> requirement, so I'd like to ask if anyone else agrees. In python code, I
> have the following, although I can rewrite it in C if people on this list
> like it enough to add it to core:
>
> def index_subseq(self, subseq):
> '''slice over which subseq is a sub-sequence of sequence self'''
> i = -1
> while True:
> i = self.index(subseq[0], i + 1)
> if all((a == b) for a, b in zip(subseq, itertools.chain(self[i:],
> iter(object, None)))):
> return slice(i, i + len(subseq))
>
> (0, 1, 2, 3, 4, 5, 6, 7, 8).index_subseq((3,4,5))) # slice(3, 6, None)
> [0, 1, 2, 3, 4, 5, 6, 7, 8].index_subseq((3,4,5)))
> (0, 1, 2, 3, 4).index_subseq((3,4,5))) # ValueError
> [0, 1, 2, 3, 4].index_subseq((3,4,5)))
>
> [This listing omits the monkeypatching into the list and tuple builtins.]
>
> The index() method of more specialized sequences like str and bytearray
> already has behavior much like what I propose for this method, since it
> doesn't have to worry about elements of those sequences having elements in
> turn. This method returns a slice object: I considered but decided against
> returning just the initial index of the slice. As I've written it here, the
> method doesn't care whether it is passed a list or a tuple as the "subseq"
> argument. This could be generalized to take any iterable. If anyone has a
> suggestion for a better name than "index_subseq", that would be good too.
>
> If people think this is a good idea, I'll post a patch in the tracker.
>
> thanks,
> Jess
Hi Jess,
There are many algorithms for sub-sequence search, most of which can
be significantly more efficient than the one you used under common
circumstances. For example, see the Knuth-Morris-Pratt algorithm [1],
and an example Python implementation on the ActiveState cookbook [2].
A good implementation that would work relatively well in most common
cases could be useful, much as Python's sorting implementation is. I
think, however, that it would be better to start this as a 3rd party
module rather than push for immediate inclusion in the stdlib.
- Tal
[1] http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
[2] http://code.activestate.com/recipes/117214/
More information about the Python-ideas
mailing list