[Python-Dev] Internal representation of strings and Micropython

Wed Jun 4 16:49:30 CEST 2014

Hello,

On Thu, 5 Jun 2014 00:26:10 +1000
Chris Angelico <rosuav at gmail.com> wrote:

> On Thu, Jun 5, 2014 at 12:17 AM, Serhiy Storchaka
> <storchaka at gmail.com> wrote:
> > 04.06.14 10:03, Chris Angelico написав(ла):
> >
> >> Right, which is why I don't like the idea. But you don't need
> >> non-ASCII characters to blink an LED or turn a servo, and there is
> >> significant resistance to the notion that appending a non-ASCII
> >> character to a long ASCII-only string requires the whole string to
> >> be copied and doubled in size (lots of heap space used).
> >
> >
> > But you need non-ASCII characters to display a title of MP3 track.

Yes, but to display a title, you don't need to do codepoint access at
random - you need to either take a block of memory (length in bytes) and
do something with it (pass to a C function, transfer over some bus,
etc.), or *iterate in order* over codepoints in a string. All these
operations are as efficient (O-notation) for UTF-8 as for UTF-32.

Some operations are not going to be as fast, so - oops - avoid doing
them without good reason. And kindly drop expectations that doing
arbitrary operations on *Unicode* are as efficient as you imagined.
(Note the *Unicode* in general, not particular flavor of which you got
used to, up to thinking it's the one and only "right" flavor.)

> Agreed. IMO, any Python, no matter how micro, needs full Unicode
> support; but there is resistance from uPy's devs.

FUD ;-).

> 
> ChrisA

-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com