[Python-ideas] proposed sequence method: index_subseq()
Steven D'Aprano
steve at pearwood.info
Wed Aug 28 10:43:15 CEST 2013
On Wed, Aug 28, 2013 at 11:21:36AM +0300, Tal Einat wrote:
> On Tue, Aug 27, 2013 at 7:19 PM, Jess Austin <jess.austin at gmail.com> wrote:
> > On Tue, Aug 27, 2013 at 10:27 AM, Tal Einat <taleinat at gmail.com> wrote:
> >>
> >> There are many algorithms for sub-sequence search, most of which can
> >> be significantly more efficient than the one you used under common
> >> circumstances. For example, see the Knuth-Morris-Pratt algorithm [1],
> >> and an example Python implementation on the ActiveState cookbook [2].
> >
> >
> > Good point; O(n+m) is better than O(n*m). Minor observation: KMP would
> > disallow the possibility I raised of subseq being just an iterator, rather
> > than a sequence. I think that's OK, since my use cases haven't had iterators
> > here. It actually seems more likely that the "containing" object will be an
> > iterator, which the recipe you linked would allow. Hmmm....
>
> You meant that the searched sequence would be an iterator, not the
> sub-sequence, right? This should be possible with KMP or similar
> algorithms (e.g. as described under the "Variants" section in the KMP
> Wikipedia article).
What would be the point of that? Having searched the iterator for some
sub-sequence, you will have consumed the iterator and can no longer do
anything with the values consumed.
It is true that this works:
py> 4 in iter([2, 3, 4, 5])
True
but I believe that is a side-effect of how the ``in`` operator works,
not a deliberate design feature.
I think it is perfectly reasonable to insist on actual sequences for
sub-sequence testing, even if the algorithm happens to work on
iterators.
--
Steven
More information about the Python-ideas
mailing list