[Python-ideas] proposed sequence method: index_subseq()

Steven D'Aprano steve at pearwood.info
Wed Aug 28 10:43:15 CEST 2013


On Wed, Aug 28, 2013 at 11:21:36AM +0300, Tal Einat wrote:
> On Tue, Aug 27, 2013 at 7:19 PM, Jess Austin <jess.austin at gmail.com> wrote:
> > On Tue, Aug 27, 2013 at 10:27 AM, Tal Einat <taleinat at gmail.com> wrote:
> >>
> >> There are many algorithms for sub-sequence search, most of which can
> >> be significantly more efficient than the one you used under common
> >> circumstances. For example, see the Knuth-Morris-Pratt algorithm [1],
> >> and an example Python implementation on the ActiveState cookbook [2].
> >
> >
> > Good point; O(n+m) is better than O(n*m). Minor observation: KMP would
> > disallow the possibility I raised of subseq being just an iterator, rather
> > than a sequence. I think that's OK, since my use cases haven't had iterators
> > here. It actually seems more likely that the "containing" object will be an
> > iterator, which the recipe you linked would allow. Hmmm....
> 
> You meant that the searched sequence would be an iterator, not the
> sub-sequence, right? This should be possible with KMP or similar
> algorithms (e.g. as described under the "Variants" section in the KMP
> Wikipedia article).

What would be the point of that? Having searched the iterator for some 
sub-sequence, you will have consumed the iterator and can no longer do 
anything with the values consumed.

It is true that this works:


py> 4 in iter([2, 3, 4, 5])
True


but I believe that is a side-effect of how the ``in`` operator works, 
not a deliberate design feature.

I think it is perfectly reasonable to insist on actual sequences for 
sub-sequence testing, even if the algorithm happens to work on 
iterators.


-- 
Steven


More information about the Python-ideas mailing list