04.06.14 18:38, Paul Sokolovsky написав(ла):
Any non-trivial text parsing uses indices or regular expressions (and regular expressions themself use indices internally).
I keep hearing this stuff, and unfortunately so far don't have enough time to collect all that stuff and provide detailed response. So, here's spur of the moment response - hopefully we're in the same context so it is easy to understand.
So, gentlemen, you keep mixing up character-by-character random access to string and taking substrings of a string.
Character-by-character random access imply that you would need to scan thru (possibly almost) all chars in a string. That's O(N) (N-length of string). With varlength encoding (taking O(N) to index arbitrary char), there's thus concern that this would be O(N^2) op.
But show me real-world case for that. Common usecase is scanning string left-to-right, that should be done using iterator and thus O(N). Right-to-left scanning would be order(s) of magnitude less frequent, as and also handled by iterator.
html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't use iterators. They use indices, str.find and/or regular expressions. Common use case is quickly find substring starting from current position using str.find or re.search, process found token, advance position and repeat.