Hello, On Wed, 04 Jun 2014 17:40:14 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
04.06.14 17:02, Paul Moore написав(ла):
On 4 June 2014 14:39, Serhiy Storchaka <storchaka@gmail.com> wrote:
I think than breaking O(1) expectation for indexing makes the implementation significant incompatible with Python. Virtually all string operations in Python operates with indices.
I don't use indexing on strings except in rare situations. Sure I use lots of operations that may well use indexing *internally* but that's the point. MicroPython can optimise those operations without needing to guarantee O(1) indexing, and I'd be fine with that.
Any non-trivial text parsing uses indices or regular expressions (and regular expressions themself use indices internally).
I keep hearing this stuff, and unfortunately so far don't have enough time to collect all that stuff and provide detailed response. So, here's spur of the moment response - hopefully we're in the same context so it is easy to understand. So, gentlemen, you keep mixing up character-by-character random access to string and taking substrings of a string. Character-by-character random access imply that you would need to scan thru (possibly almost) all chars in a string. That's O(N) (N-length of string). With varlength encoding (taking O(N) to index arbitrary char), there's thus concern that this would be O(N^2) op. But show me real-world case for that. Common usecase is scanning string left-to-right, that should be done using iterator and thus O(N). Right-to-left scanning would be order(s) of magnitude less frequent, as and also handled by iterator. What's next? You're doing some funky anagrams and need to swap each 2 adjacent chars? Sorry, naive implementation will be slow. If you're in serious anagram business, you'll need to code C extension. No, wait! Instead you should learn Python better. You should run a string windowing iterator which will return adjacent pair and swap those constant-len strings. More cases anyone? Implementing DES and doing arbitrary permutations? Kindly drop doing that on strings, do it on bytes or lists. Hopefully, the idea is clear - if you *scan* thru string using indexes in *random* order, you're doing weird thing and *want* weird performance. Doing stuff is s[0] ot s[-1] - there's finite (and small) number of such operation per strings. Now about taking substrings of strings (which in Python often expressed by slice indexing). Well, this is quite different from scanning each character of a strings. Just like s[0]/s[-1] this usually happens finite number of times for a particular string, independent of its length, i.e. O(1) times (ex, you take a string and split it in 3 parts), or maybe number of substrings is not bound-fixed, but has different growth order, O(M) (for example, you split string in tokens, tokens can be long, but there're usually external limits on how many it's sensible to have on one line). So, again, you're not going to get quadric time unless you're unlucky or sloppy. And just again, you should brush up your Python skills and use regex functions shich return iterators to get your parsed tokens, etc. (To clarify the obvious - "you" here is abstract pronoun, not referring to respectable Python developers who actually made it possible to write efficient Python programs). So, hopefully the point is conveyed - you can write inefficient Python programs. CPython goes out of the way to hide many inefficiencies (using unbelievably bloated heap usage - from uPy's point of view, which starts up in 2K heap). You just shouldn't write inefficient programs, voila. But if you want, you can keep writing inefficient programs, they just will be inefficient. Peace.
It would be interesting to collect a statistic about how many indexing operations happened during the life of a string in typical (Micro)Python program.
Yup. -- Best regards, Paul mailto:pmiscml@gmail.com