[Python-Dev] Internal representation of strings and Micropython

Chris Angelico rosuav at gmail.com
Wed Jun 4 17:00:52 CEST 2014


On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky <pmiscml at gmail.com> wrote:
>> > But you need non-ASCII characters to display a title of MP3 track.
>
> Yes, but to display a title, you don't need to do codepoint access at
> random - you need to either take a block of memory (length in bytes) and
> do something with it (pass to a C function, transfer over some bus,
> etc.), or *iterate in order* over codepoints in a string. All these
> operations are as efficient (O-notation) for UTF-8 as for UTF-32.

Suppose you have a long title, and you need to abbreviate it by
dropping out words (delimited by whitespace), such that you keep the
first word (always) and the last (if possible) and as many as possible
in between. How are you going to write that? With PEP 393 or UTF-32
strings, you can simply record the index of every whitespace you find,
count off lengths, and decide what to keep and what to ellipsize.

> Some operations are not going to be as fast, so - oops - avoid doing
> them without good reason. And kindly drop expectations that doing
> arbitrary operations on *Unicode* are as efficient as you imagined.
> (Note the *Unicode* in general, not particular flavor of which you got
> used to, up to thinking it's the one and only "right" flavor.)

Not sure what you mean by flavors of Unicode. Unicode is a mapping of
codepoints to characters, not an in-memory representation. And I've
been working with Python 3.3 since before it came out, and with Pike
(which has a very similar model) for longer, and in both of them, I
casually perform operations on Unicode strings in the same way that I
used to perform operations on REXX strings (which were eight-bit in
the current system codepage - 437 for us). I do expect those
operations to be efficient, and I get what I expect.

Maybe they won't be in uPy, but that would be a limitation of uPy, not
a fundamental problem with Unicode.

ChrisA


More information about the Python-Dev mailing list