Re: [Python-ideas] FInd first tuple argument for str.find and str.index

It could do something like that if you passed an argument telling it to quit on the first match. But that makes the return type depend on the passed arg, which I guess is not good. We'd already be doing that if we returned a dict, but this would return either a tuple or a dict. You could drop the dict idea altogether, but you need to consider what to do if many (probably different) patterns match, all starting at the same location in the string. For this reason alone I don't think returning a (start searchstring) tuple is sufficient. Given that Aho & Corasick find everything you could want to know (all matches of all patterns), and that they do it in linear time, it doesn't seem right to throw this information away - especially after going to the trouble of building and walking the trie. Terry

Terry Jones wrote:
I was thinking of something a bit more light weight. For more complex stuff I think the 're' module already does pretty much what you are describing. It may even already take advantage of the algorithms you referred to. If not, that would be an important improvement to the re module. :-) The use case I had in mind was to find starting and ending delimiters. And to avoid the following type of awkward code. (This would work for finding other things as well of course.) start = 0 while start < len(s): i1 = s.find('{', start) if i1 == -1: i1 = len(s) i2 = s.find('}', start) if i2 == -1: i2 = len(s) # etc... for as many search terms as you have... # or use a loop to locate each one. start = min(i1, i2) if start == len(s): break ... # do something with s[start] ... That works but it has to go through the string once for each item. Of course I would use 're' for anything more complex than a few fixed length terms. The above could be simplified greatly to the following and be much quicker over what we have now and still not be overly complex. start = 0 while start < len(s): try: start = s.index(('{', '}'), start) except ValueError: break ... # do something with s[start] ...
Thanks for the reference, I'll look into it. :-) If the function returns something other than a simple index, then I think it will need to be a new function or method and not just an alteration of str.index and str.find. In that case it may also need a PEP. Cheers, Ron

Terry Jones wrote:
I was thinking of something a bit more light weight. For more complex stuff I think the 're' module already does pretty much what you are describing. It may even already take advantage of the algorithms you referred to. If not, that would be an important improvement to the re module. :-) The use case I had in mind was to find starting and ending delimiters. And to avoid the following type of awkward code. (This would work for finding other things as well of course.) start = 0 while start < len(s): i1 = s.find('{', start) if i1 == -1: i1 = len(s) i2 = s.find('}', start) if i2 == -1: i2 = len(s) # etc... for as many search terms as you have... # or use a loop to locate each one. start = min(i1, i2) if start == len(s): break ... # do something with s[start] ... That works but it has to go through the string once for each item. Of course I would use 're' for anything more complex than a few fixed length terms. The above could be simplified greatly to the following and be much quicker over what we have now and still not be overly complex. start = 0 while start < len(s): try: start = s.index(('{', '}'), start) except ValueError: break ... # do something with s[start] ...
Thanks for the reference, I'll look into it. :-) If the function returns something other than a simple index, then I think it will need to be a new function or method and not just an alteration of str.index and str.find. In that case it may also need a PEP. Cheers, Ron
participants (2)
-
Ron Adam
-
Terry Jones