[Python-ideas] proposed sequence method: index_subseq()

Steven D'Aprano steve at pearwood.info
Tue Aug 27 19:34:37 CEST 2013


On 28/08/13 00:57, Jess Austin wrote:
> Recently I've repeatedly needed to check whether a particular sequence
> occurred as a "subsequence" of another. I think this could be a general
> requirement, so I'd like to ask if anyone else agrees. In python code, I
> have the following, although I can rewrite it in C if people on this list
> like it enough to add it to core:

Well, I personally think it is an obvious and useful piece of functionality:

http://code.activestate.com/recipes/577850-search-sequences-for-sub-sequence/

but a less naive implementation would be more useful.


> def index_subseq(self, subseq):
>      '''slice over which subseq is a sub-sequence of sequence self'''
>      i = -1
>      while True:
>          i = self.index(subseq[0], i + 1)
>          if all((a == b) for a, b in zip(subseq, itertools.chain(self[i:],
>                  iter(object, None)))):
>              return slice(i, i + len(subseq))
>
> (0, 1, 2, 3, 4, 5, 6, 7, 8).index_subseq((3,4,5)))       # slice(3, 6, None)
> [0, 1, 2, 3, 4, 5, 6, 7, 8].index_subseq((3,4,5)))
> (0, 1, 2, 3, 4).index_subseq((3,4,5)))                      # ValueError
> [0, 1, 2, 3, 4].index_subseq((3,4,5)))
>
> [This listing omits the monkeypatching into the list and tuple builtins.]

Okay, now I'm curious. Unless you're talking about patching the Python compiler, how on earth are you monkey-patching *into* list and tuple?



> The index() method of more specialized sequences like str and bytearray
> already has behavior much like what I propose for this method, since it
> doesn't have to worry about elements of those sequences having elements in
> turn. This method returns a slice object: I considered but decided against
> returning just the initial index of the slice.

I dislike that. The end and step parts of the slice are redundant: end is easily calculated as just start + len(subseq), and step will always be None. A more familiar, and obvious, functionality is to return the starting index, and then either raise an exception if not found, or return some sentinel value (not -1 since it can be used as an index in error).


>As I've written it here, the
> method doesn't care whether it is passed a list or a tuple as the "subseq"
> argument. This could be generalized to take any iterable. If anyone has a
> suggestion for a better name than "index_subseq", that would be good too.
>
> If people think this is a good idea, I'll post a patch in the tracker.

+0.5

I found this useful enough to write my own quick and dirty version, but not enough to spend time optimizing it.



-- 
Steven


More information about the Python-ideas mailing list