[Python-ideas] FInd first tuple argument for str.find and str.index

Ron Adam rrr at ronadam.com
Wed Sep 5 20:19:36 CEST 2007



Terry Jones wrote:
>>>>>> "Mathias" == Mathias Panzenböck <grosser.meister.morti at gmx.net> writes:
> Mathias> I would expect such a method to return the index where one of the
> Mathias> given strings was found. Or maybe a tuple: (start, end) or a
> Mathias> tuple: (start, searchstring).
> 
> It could do something like that if you passed an argument telling it to
> quit on the first match. But that makes the return type depend on the
> passed arg, which I guess is not good. We'd already be doing that if we
> returned a dict, but this would return either a tuple or a dict.
> 
> You could drop the dict idea altogether, but you need to consider what to
> do if many (probably different) patterns match, all starting at the same
> location in the string. For this reason alone I don't think returning a
> (start searchstring) tuple is sufficient.

I was thinking of something a bit more light weight.

For more complex stuff I think the 're' module already does pretty much 
what you are describing.  It may even already take advantage of the 
algorithms you referred to.  If not, that would be an important improvement 
to the re module. :-)

The use case I had in mind was to find starting and ending delimiters.  And 
to avoid the following type of awkward code.  (This would work for finding 
other things as well of course.)

    start = 0
    while start < len(s):

        i1 = s.find('{', start)
        if i1 == -1:
            i1 = len(s)

        i2 = s.find('}', start)
        if i2 == -1:
            i2 = len(s)

        # etc... for as many search terms as you have...
        # or use a loop to locate each one.

        start = min(i1, i2)
        if start == len(s):
            break

        ...
        # do something with s[start]
        ...

That works but it has to go through the string once for each item.  Of 
course I would use 're' for anything more complex than a few fixed length 
terms.

The above could be simplified greatly to the following and be much quicker 
over what we have now and still not be overly complex.

   start = 0
   while start < len(s):
      try:
         start = s.index(('{', '}'), start)
      except ValueError:
         break
      ...
      # do something with s[start]
      ...


> Given that Aho & Corasick find everything you could want to know (all
> matches of all patterns), and that they do it in linear time, it doesn't
> seem right to throw this information away - especially after going to the
> trouble of building and walking the trie.

Thanks for the reference, I'll look into it.  :-)

If the function returns something other than a simple index, then I think 
it will need to be a new function or method and not just an alteration of 
str.index and str.find.  In that case it may also need a PEP.

Cheers,
    Ron




More information about the Python-ideas mailing list