On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky <pmiscml@gmail.com> wrote:
But you need non-ASCII characters to display a title of MP3 track.
Yes, but to display a title, you don't need to do codepoint access at random - you need to either take a block of memory (length in bytes) and do something with it (pass to a C function, transfer over some bus, etc.), or *iterate in order* over codepoints in a string. All these operations are as efficient (O-notation) for UTF-8 as for UTF-32.
Suppose you have a long title, and you need to abbreviate it by dropping out words (delimited by whitespace), such that you keep the first word (always) and the last (if possible) and as many as possible in between. How are you going to write that? With PEP 393 or UTF-32 strings, you can simply record the index of every whitespace you find, count off lengths, and decide what to keep and what to ellipsize.
Some operations are not going to be as fast, so - oops - avoid doing them without good reason. And kindly drop expectations that doing arbitrary operations on *Unicode* are as efficient as you imagined. (Note the *Unicode* in general, not particular flavor of which you got used to, up to thinking it's the one and only "right" flavor.)
Not sure what you mean by flavors of Unicode. Unicode is a mapping of codepoints to characters, not an in-memory representation. And I've been working with Python 3.3 since before it came out, and with Pike (which has a very similar model) for longer, and in both of them, I casually perform operations on Unicode strings in the same way that I used to perform operations on REXX strings (which were eight-bit in the current system codepage - 437 for us). I do expect those operations to be efficient, and I get what I expect. Maybe they won't be in uPy, but that would be a limitation of uPy, not a fundamental problem with Unicode. ChrisA