Hello, On Wed, 4 Jun 2014 12:32:12 +1000 Chris Angelico <rosuav@gmail.com> wrote:
On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano <steve@pearwood.info> wrote:
* Having a build-time option to restrict all strings to ASCII-only.
(I think what they mean by that is that strings will be like Python 2 strings, ASCII-plus-arbitrary-bytes, not actually ASCII.)
What I was actually suggesting along those lines was that the str type still be notionally a Unicode string, but that any codepoints >127 would either raise an exception or blow an assertion,
That's another reason why people don't like Unicode enforced upon them - all the talk about supporting all languages and scripts is demagogy and hypocrisy, given a choice, Unicode zealots would rather limit people to Latin script then give up on their arbitrarily chosen, one-among-thousands, soon-to-be-replaced-by-apples'-and-microsofts'-"exciting-new" encoding. Once again, my claim is what MicroPython implements now is more correct - in a sense wider than technical - handling. We don't provide Unicode encoding support, because it's highly bloated, but let people use any encoding they like. That comes at some price, like length of strings in characters are not know to runtime, only in bytes, but quite a lot of applications can be written by having just that. And I'm saying that not to discourage Unicode addition to MicroPython, but to hint that "force-force" approach implemented by CPython3 and causing rage and split in the community is not appreciated.
and all the code to handle multibyte representations would be compiled out. So there'd still be a difference between strings of text and streams of bytes, but all encoding and decoding to/from ASCII-compatible encodings would just point to the same bytes in RAM.
Risk: Someone would implement that with assertions, then compile with assertions disabled, test only with ASCII, and have lurking bugs.
ChrisA
-- Best regards, Paul mailto:pmiscml@gmail.com