[Python-ideas] FInd first tuple argument for str.find and str.index
Terry Jones
terry at jon.es
Wed Sep 5 18:14:30 CEST 2007
>>>>> "Guido" == Guido van Rossum <guido at python.org> writes:
Guido> I was surprised to find that startswith and endswith support this,
Guido> but it does make sense. Adding a patch to 2.6 would cause it to be
Guido> merged into 3.0 soon enough.
Guido> On 9/4/07, Ron Adam <rrr at ronadam.com> wrote:
>> Could we add the ability of str.index and str.find to accept a tuple as the
>> first argument and return the index of the first item found in it.
Hi
If someone is going to head down this path, it might be better to implement
a more general algorithm and provide the above as a special case via an
argument.
There's a fast and beautiful algorithm due to Aho & Corasick (CACM, 1975)
that finds _all_ matches of a set of patterns. It runs in time that's
linear in max(sum of the lengths of the patterns to be matched, length of
to-be-matched text). The algorithm was the basis of fgrep.
To provide the above, a special case could have it return as soon as it
found a first match (i.e., of any pattern).
One general way to write it would be to have it return a dict of patterns
and result indices in the case that the pattern argument is a tuple.
So
"Hey, look, look, look at these patterns".find(('look', 'for', 'these', 'patterns'))
might return
{
'look' : [ 5, 11, 17 ],
'for' : [ ], # or arguably [ -1 ],
'these' : [ 25 ],
'patterns' : [ 31 ],
}
OK, that's a bit of a departure from the normal behavior of find, but so is
passing a tuple of patterns. Alternately, you could also get back a tuple
of (tuples of) matching indices.
The ideal calling interface and result depends on what you need to do -
check if a specific string matched? Just know the first match offset, etc.
I don't know the best solution, but the algorithm rocks. Raymond - you'll
love it :-)
Terry
More information about the Python-ideas
mailing list