[Python-Dev] Internal representation of strings and Micropython

Wed Jun 4 17:53:52 CEST 2014

Hello,

On Thu, 5 Jun 2014 01:00:52 +1000
Chris Angelico <rosuav at gmail.com> wrote:

> On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky <pmiscml at gmail.com>
> wrote:
> >> > But you need non-ASCII characters to display a title of MP3
> >> > track.
> >
> > Yes, but to display a title, you don't need to do codepoint access
> > at random - you need to either take a block of memory (length in
> > bytes) and do something with it (pass to a C function, transfer
> > over some bus, etc.), or *iterate in order* over codepoints in a
> > string. All these operations are as efficient (O-notation) for
> > UTF-8 as for UTF-32.
> 
> Suppose you have a long title, and you need to abbreviate it by
> dropping out words (delimited by whitespace), such that you keep the
> first word (always) and the last (if possible) and as many as possible
> in between. How are you going to write that? With PEP 393 or UTF-32
> strings, you can simply record the index of every whitespace you find,
> count off lengths, and decide what to keep and what to ellipsize.

I'll submit angry bugreport along the lines of "WWWHAT, it's 3.5 and
there's still no str.isplit()??!!11", then do it with re.finditer()
(while submitting another report on inconsistent naming scheme).

[]

-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com