[Python-ideas] FInd first tuple argument for str.find and str.index

Terry Jones terry at jon.es
Wed Sep 5 21:55:44 CEST 2007


Hi Ron

>>>>> "Ron" == Ron Adam <rrr at ronadam.com> writes:
Ron> I was thinking of something a bit more light weight.

Ah, now we got to what you actually want to do :-)

Ron> For more complex stuff I think the 're' module already does pretty
Ron> much what you are describing.  It may even already take advantage of
Ron> the algorithms you referred to.  If not, that would be an important
Ron> improvement to the re module. :-)

Yes, that would make a good SoC project. But as you say it may already be
done that way.

Ron> The use case I had in mind was to find starting and ending delimiters.
Ron> And to avoid the following type of awkward code.  (This would work for
Ron> finding other things as well of course.)

    start = 0
    while start < len(s):

        i1 = s.find('{', start)
        if i1 == -1:
            i1 = len(s)

        i2 = s.find('}', start)
        if i2 == -1:
            i2 = len(s)

        # etc... for as many search terms as you have...
        # or use a loop to locate each one.

        start = min(i1, i2)
        if start == len(s):
            break

        ...
        # do something with s[start]
        ...

Ron> That works but it has to go through the string once for each item.

It's worse than that. _Each time around the loop_ it tests all candidates
against all of the remaining text. Imagine matching L patterns that were
each 'a' * M against a text of 'a' * N. You're going to do (roughly) O(N *
L * M) comparisons, using naive string matching.

You could completely drop patterns that have already returned -1. You can
short-circuit a loop if you ever got a 0 index back.  Sorry if this seems
picky - I guess you're just writing quick pseudo-code.

Ron> Of course I would use 're' for anything more complex than a few fixed
Ron> length terms.

Yes. It depends partly on what you want back. You could write a super-fast
iterator based on A&C that told you everything you need to know, with
guaranteed linear worst case behavior, but that seems like overkill here
(and it may be in re, as you say).

Ron> If the function returns something other than a simple index, then I
Ron> think it will need to be a new function or method and not just an
Ron> alteration of str.index and str.find.  In that case it may also need a
Ron> PEP.

Be my guest :-)

Regards,
Terry



More information about the Python-ideas mailing list