
Terry Jones wrote:
It's worse than that. _Each time around the loop_ it tests all candidates against all of the remaining text. Imagine matching L patterns that were each 'a' * M against a text of 'a' * N. You're going to do (roughly) O(N * L * M) comparisons, using naive string matching.
Yep, it's bad, which is why I don't want to do it this way. Of course it's much better than ... for char in string: ... etc ...
You could completely drop patterns that have already returned -1. You can short-circuit a loop if you ever got a 0 index back. Sorry if this seems picky - I guess you're just writing quick pseudo-code.
Well its a but if both pseudo-code and from an actual problem I was working on. It was a first unoptimized version. First rule is to get something that works, then make it fast after it's tested. Right? :-) No you aren't being any more picky than I am. I didn't like that solution either which is why I suggested improving index and find just a bit. And I'm not real keen on this next one although it will probably work better. (not tested) length = len(s) cases = dict((c, 0) for c in terms) while 1: for term, i in cases.items(): if i == start: i = s.find(term, start) cases[term] = i if i > -1 else length start = min(cases.values()) if start == length: break ... Do something with s[start] Return some result we constructed or found along the way. That's an improvement, But it's still way more work than I think the problem should need. It's also complex enough that it's no longer obvious just what it's doing. I still like the tuple version of index, or find better. Much easier to read and understand. ... try: start = s.index(('{', '}'), start) except: break ... ... start = s.find(('{', '}'), start) if start == -1: break ...
Ron> Of course I would use 're' for anything more complex than a few fixed Ron> length terms.
Yes. It depends partly on what you want back. You could write a super-fast iterator based on A&C that told you everything you need to know, with guaranteed linear worst case behavior, but that seems like overkill here (and it may be in re, as you say).
Regular expression can be slower for small things because it has more overhead. It's not always the fasted choice. And I've read they are not good at is parsing nested delimiters. Cheers, RON
Ron> If the function returns something other than a simple index, then I Ron> think it will need to be a new function or method and not just an Ron> alteration of str.index and str.find. In that case it may also need a Ron> PEP.
Be my guest :-)
Regards, Terry