On Thu, 05 Jun 2014 16:54:11 +0900 "Stephen J. Turnbull" firstname.lastname@example.org wrote:
Paul Sokolovsky writes:
Please put that in perspective when alarming over O(1) indexing of inherently problematic niche datatype. (Again, it's not my or MicroPython's fault that it was forced as standard string type. Maybe if CPython seriously considered now-standard UTF-8 encoding, results of what is "str" type might be different. But CPython has gigabytes of heap to spare, and for MicroPython, every half-bit is precious).
Would you please stop trolling? The reasons for adopting Unicode as a separate data type were good and sufficient in 2000, and they remain
If it was kept at "separate data type" bay, there wouldn't be any problem. But it was made "one and only string type", and all strife started then.
And there going to be "trolling" as long as Python developers and decision-makers will ignore (troll?) outcry from the community (again, I was surprised and not surprised to see ~50% of traffic on python-list touches Unicode issues).
Well, I understand the plan - hoping that people will "get over this". And I'm personally happy to stay away from this "trolling", but any discussion related to Unicode goes in circles and returns to feeling that Unicode at the central role as put there by Python3 is misplaced.
so today, even if you have been fortunate enough not to burn yourself on character-byte conflation yet.
What matters to you is that str (unicode) is an opaque type -- there is no specification of the internal representation in the language reference, and in fact several different ones coexist happily across existing Python implementations -- and you're free to use a UTF-8 implementation if that suits the applications you expect for MicroPython.
PEP 393 exists, of course, and specifies the current internal representation for CPython 3. But I don't see anything in it that suggests it's mandated for any other implementation.
I knew all this before very well. What's strange is that other developers don't know, or treat seriously, all of the above. That's why gentleman who kindly was interested in adding Unicode support to MicroPython started with the idea of dragging in CPython implementation. And the only effect persuasion that it's not necessarily the best solution had, was that he started to feel that he's being manipulated into writing something ugly, instead of the bright idea he had.
That's why another gentleman reduces it to: "O(1) on string indexing or not a Python!".
And that's why another gentleman, who agrees to UTF-8 arguments, still gives an excuse (https://mail.python.org/pipermail/python-dev/2014-June/134727.html): "In this context, while a fixed-width encoding may be the correct choice it would also likely be the wrong choice."
In this regard, I'm glad to participate in mind-resetting discussion. So, let's reiterate - there's nothing like "the best", "the only right", "the only correct", "righter than", "more correct than" in CPython's implementation of Unicode storage. It is *arbitrary*. Well, sure, it's not arbitrary, but based on requirements, and these requirements match CPython's (implied) usage model well enough. But among all possible sets of requirements, CPython's requirements are no more valid that other possible. And other set of requirement fairly clearly lead to situation where CPython implementation is rejected as not correct for those requirements at all.