[Python-ideas] FInd first tuple argument for str.find and str.index

Terry Jones terry at jon.es
Wed Sep 5 18:14:30 CEST 2007


>>>>> "Guido" == Guido van Rossum <guido at python.org> writes:
Guido> I was surprised to find that startswith and endswith support this,
Guido> but it does make sense. Adding a patch to 2.6 would cause it to be
Guido> merged into 3.0 soon enough.

Guido> On 9/4/07, Ron Adam <rrr at ronadam.com> wrote:
>> Could we add the ability of str.index and str.find to accept a tuple as the
>> first argument and return the index of the first item found in it.

Hi

If someone is going to head down this path, it might be better to implement
a more general algorithm and provide the above as a special case via an
argument.

There's a fast and beautiful algorithm due to Aho & Corasick (CACM, 1975)
that finds _all_ matches of a set of patterns. It runs in time that's
linear in max(sum of the lengths of the patterns to be matched, length of
to-be-matched text). The algorithm was the basis of fgrep.

To provide the above, a special case could have it return as soon as it
found a first match (i.e., of any pattern).

One general way to write it would be to have it return a dict of patterns
and result indices in the case that the pattern argument is a tuple.

So

  "Hey, look, look, look at these patterns".find(('look', 'for', 'these', 'patterns'))

might return

    {
      'look'     : [ 5, 11, 17 ],
      'for'      : [ ],  # or arguably [ -1 ],
      'these'    : [ 25 ],
      'patterns' : [ 31 ],
    }

OK, that's a bit of a departure from the normal behavior of find, but so is
passing a tuple of patterns. Alternately, you could also get back a tuple
of (tuples of) matching indices.

The ideal calling interface and result depends on what you need to do -
check if a specific string matched? Just know the first match offset, etc.

I don't know the best solution, but the algorithm rocks. Raymond - you'll
love it :-)

Terry



More information about the Python-ideas mailing list