flaming vs accuracy [was Re: Performance of int/long in Python 3]

Ian Foote ian at feete.org
Thu Mar 28 10:36:19 CET 2013

On 28/03/13 09:03, jmfauth wrote:
> The problem is elsewhere. Nobody understand the examples
> I gave on this list, because nobody understand Unicode.
> These examples are not random examples, they are well
> thought.
> If you were understanding the coding of the characters,
> Unicode and what this flexible representation does, it
> would not be a problem for you to create analog examples.
> So, we are turning into circles.
> This flexible representation succeeds to cumulate in one
> shoot all the design mistakes it is possible to do, when
> one wishes to implements Unicode.
> Example of a good Unicode understanding.
> If you wish 1) to preserve memory, 2) to cover the whole range
> of Unicode, 3) to keep maximum performance while preserving the
> good work Unicode.org as done (normalization, sorting), there
> is only one solution: utf-8. For this you have to understand,
> what is really a "unicode transformation format".
> Why all the actors, active in the "text field", like MicroSoft,
> Apple, Adobe, the unicode compliant TeX engines, the foundries,
> the "organisation" in charge of the OpenType font specifications,
> are able to handle all this stuff correctly (understanding +
> implementation) and Python not?, I should say this is going
> beyond my understanding.
> Python has certainly and definitvely not "revolutionize"
> Unicode.
> jmf

You're confusing python's choice of internal string representation with 
the programmer's choice of encoding for communicating with other programs.

I think most people agree that utf-8 is usually the best encoding to use 
for interoperating with other unicode aware software, but as a 
variable-length encoding it has disadvantages that make it unsuitable 
for use as an internal representation.

Specifically, indexing a variable-length encoding like utf-8 is not as 
efficient as indexing a fixed-length encoding.

Ian F

More information about the Python-list mailing list