Micro Python -- a lean and efficient implementation of Python 3
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Tue Jun 10 03:32:34 EDT 2014
Le mercredi 4 juin 2014 13:53:19 UTC+2, Robin Becker a écrit :
> On 04/06/2014 12:01, Tim Chase wrote:
>
> > On 2014-06-04 00:58, Paul Rubin wrote:
>
> >> Steven D'Aprano <steve at pearwood.info> writes:
>
> >>>> Maybe there's a use-case for a microcontroller that works in
>
> >>>> ISO-8859-5 natively, thus using only eight bits per character,
>
> >>> That won't even make the Russians happy, since in Russia there
>
> >>> are multiple incompatible legacy encodings.
>
> >>
>
> >> I've never understood why not use UTF-8 for everything.
>
> >
>
> > If you use UTF-8 for everything, then you end up in a world where
>
> > string-indexing (see ChrisA's other side thread on this topic) is no
>
> > longer an O(1) operation, but an O(N) operation. Some of us slice
>
> > strings for a living. ;-) I understand that using UTF-32 would allow
>
> > us to maintain O(1) indexing at the cost of every string occupying 4
>
> > bytes per character. The FSR (again, as I understand it) allows
>
> > strings that fit in one-byte-per-character to use that, scaling up to
>
> > use wider characters internally as they're actually needed/used.
>
> >
>
> ........
>
> I believe that we should distinguish between glyph/character indexing and string
>
> indexing. Even in unicode it may be hard to decide where a visual glyph starts
>
> and ends. I assume most people would like to assign one glyph to one unicode,
>
> but that's not always possible with composed glyphs.
>
>
>
> >>> for a in (u'\xc5',u'A\u030a'):
>
> ... for o in (u'\xf6',u'o\u0308'):
>
> ... u=a+u'ngstr'+o+u'm'
>
> ... print("%s %s" % (repr(u),u))
>
> ...
>
> u'\xc5ngstr\xf6m' Ångström
>
> u'\xc5ngstro\u0308m' Ångström
>
> u'A\u030angstr\xf6m' Ångström
>
> u'A\u030angstro\u0308m' Ångström
>
> >>> u'\xc5ngstr\xf6m'==u'\xc5ngstro\u0308m'
>
> False
>
>
>
> so even unicode doesn't always allow for O(1) glyph indexing. I know this is
>
> artificial, but this is the same situation as utf8 faces just the frequency of
>
> occurrence is different. A very large amount of computing is still western
>
> centric so searching a byte string for latin characters is still efficient;
>
> searching for an n with a tilde on top might not be so easy.
>
> --
>
> Robin Becker
=========
Python succeeded to become an anti-unicode product!
jmf
More information about the Python-list
mailing list