[Python-Dev] Internal representation of strings and Micropython

Paul Sokolovsky pmiscml at gmail.com
Wed Jun 4 12:38:57 CEST 2014


Hello,

On Wed, 4 Jun 2014 12:32:12 +1000
Chris Angelico <rosuav at gmail.com> wrote:

> On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano
> <steve at pearwood.info> wrote:
> > * Having a build-time option to restrict all strings to ASCII-only.
> >
> >   (I think what they mean by that is that strings will be like
> > Python 2 strings, ASCII-plus-arbitrary-bytes, not actually ASCII.)
> 
> What I was actually suggesting along those lines was that the str type
> still be notionally a Unicode string, but that any codepoints >127
> would either raise an exception or blow an assertion,

That's another reason why people don't like Unicode enforced upon them
- all the talk about supporting all languages and scripts is demagogy
and hypocrisy, given a choice, Unicode zealots would rather limit
people to Latin script then give up on their arbitrarily chosen,
one-among-thousands,
soon-to-be-replaced-by-apples'-and-microsofts'-"exciting-new" encoding.

Once again, my claim is what MicroPython implements now is more correct
- in a sense wider than technical - handling. We don't provide Unicode
encoding support, because it's highly bloated, but let people use any
encoding they like. That comes at some price, like length of strings in
characters are not know to runtime, only in bytes, but quite a lot of
applications can be written by having just that.

And I'm saying that not to discourage Unicode addition to MicroPython,
but to hint that "force-force" approach implemented by CPython3 and
causing rage and split in the community is not appreciated.

> and all the code
> to handle multibyte representations would be compiled out. So there'd
> still be a difference between strings of text and streams of bytes,
> but all encoding and decoding to/from ASCII-compatible encodings would
> just point to the same bytes in RAM.
> 
> Risk: Someone would implement that with assertions, then compile with
> assertions disabled, test only with ASCII, and have lurking bugs.
> 
> ChrisA


-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com


More information about the Python-Dev mailing list