On Thu, 5 Jun 2014 01:00:52 +1000 Chris Angelico firstname.lastname@example.org wrote:
On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky email@example.com wrote:
But you need non-ASCII characters to display a title of MP3 track.
Yes, but to display a title, you don't need to do codepoint access at random - you need to either take a block of memory (length in bytes) and do something with it (pass to a C function, transfer over some bus, etc.), or *iterate in order* over codepoints in a string. All these operations are as efficient (O-notation) for UTF-8 as for UTF-32.
Suppose you have a long title, and you need to abbreviate it by dropping out words (delimited by whitespace), such that you keep the first word (always) and the last (if possible) and as many as possible in between. How are you going to write that? With PEP 393 or UTF-32 strings, you can simply record the index of every whitespace you find, count off lengths, and decide what to keep and what to ellipsize.
I'll submit angry bugreport along the lines of "WWWHAT, it's 3.5 and there's still no str.isplit()??!!11", then do it with re.finditer() (while submitting another report on inconsistent naming scheme).