[Python-ideas] proposed sequence method: index_subseq()
Steven D'Aprano
steve at pearwood.info
Tue Aug 27 19:34:37 CEST 2013
On 28/08/13 00:57, Jess Austin wrote:
> Recently I've repeatedly needed to check whether a particular sequence
> occurred as a "subsequence" of another. I think this could be a general
> requirement, so I'd like to ask if anyone else agrees. In python code, I
> have the following, although I can rewrite it in C if people on this list
> like it enough to add it to core:
Well, I personally think it is an obvious and useful piece of functionality:
http://code.activestate.com/recipes/577850-search-sequences-for-sub-sequence/
but a less naive implementation would be more useful.
> def index_subseq(self, subseq):
> '''slice over which subseq is a sub-sequence of sequence self'''
> i = -1
> while True:
> i = self.index(subseq[0], i + 1)
> if all((a == b) for a, b in zip(subseq, itertools.chain(self[i:],
> iter(object, None)))):
> return slice(i, i + len(subseq))
>
> (0, 1, 2, 3, 4, 5, 6, 7, 8).index_subseq((3,4,5))) # slice(3, 6, None)
> [0, 1, 2, 3, 4, 5, 6, 7, 8].index_subseq((3,4,5)))
> (0, 1, 2, 3, 4).index_subseq((3,4,5))) # ValueError
> [0, 1, 2, 3, 4].index_subseq((3,4,5)))
>
> [This listing omits the monkeypatching into the list and tuple builtins.]
Okay, now I'm curious. Unless you're talking about patching the Python compiler, how on earth are you monkey-patching *into* list and tuple?
> The index() method of more specialized sequences like str and bytearray
> already has behavior much like what I propose for this method, since it
> doesn't have to worry about elements of those sequences having elements in
> turn. This method returns a slice object: I considered but decided against
> returning just the initial index of the slice.
I dislike that. The end and step parts of the slice are redundant: end is easily calculated as just start + len(subseq), and step will always be None. A more familiar, and obvious, functionality is to return the starting index, and then either raise an exception if not found, or return some sentinel value (not -1 since it can be used as an index in error).
>As I've written it here, the
> method doesn't care whether it is passed a list or a tuple as the "subseq"
> argument. This could be generalized to take any iterable. If anyone has a
> suggestion for a better name than "index_subseq", that would be good too.
>
> If people think this is a good idea, I'll post a patch in the tracker.
+0.5
I found this useful enough to write my own quick and dirty version, but not enough to spend time optimizing it.
--
Steven
More information about the Python-ideas
mailing list