Zitat von Steven D'Aprano firstname.lastname@example.org:
- Having a build-time option to restrict all strings to ASCII-only.
(I think what they mean by that is that strings will be like Python 2 strings, ASCII-plus-arbitrary-bytes, not actually ASCII.)
An ASCII-plus-arbitrary-bytes type called "str" would prevent claiming "Python 3.4 compatibility" for sure.
Restricting strings to ASCII (as Chris apparently actually suggested) would allow to claim compatibility with a stretch: existing Python code might not run on such an implementation. However, since a lot of existing Python code wouldn't run on MicroPython, anyway, one might claim to implement a Python 3.4 subset.
- Implementing Unicode internally as UTF-8, and giving up O(1)
Would either of these trade-offs be acceptable while still claiming "Python 3.4 compatibility"?
My own feeling is that O(1) string indexing operations are a quality of implementation issue, not a deal breaker to call it a Python. I can't see any requirement in the docs that str[n] must take O(1) time, but perhaps I have missed something.
I agree. It's an open question whether such an implementation would be practical, both in terms of existing Python code, and in terms of existing C extension modules that people might want to port to MicroPython.
There are more things to consider for the internal implementation, in particular how the string length is implemented. Several alternatives exist: 1. store the UTF-8 length (i.e. memory size) 2. store the number of code points (i.e. Python len()) 3. store both 4. store neither, but use null termination instead
Variant 3 is most run-time efficient, but could easily use 8 bytes just for the length, which could outweigh the storage of the actual data. Variants 1 and 2 lose on some operations (1 loses on computing len(), 2 loses on string concatenation). 3 would add the restriction of not allowing U+0000 in a string (which would be reasonable IMO), and make all length computations inefficient. However, it wouldn't be worse than standard C.