RE Module Performance
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Wed Jul 24 09:40:55 EDT 2013
Le samedi 13 juillet 2013 01:13:47 UTC+2, Michael Torrie a écrit :
> On 07/12/2013 09:59 AM, Joshua Landau wrote:
>
> > If you're interested, the basic of it is that strings now use a
>
> > variable number of bytes to encode their values depending on whether
>
> > values outside of the ASCII range and some other range are used, as an
>
> > optimisation.
>
>
>
> Variable number of bytes is a problematic way to saying it. UTF-8 is a
>
> variable-number-of-bytes encoding scheme where each character can be 1,
>
> 2, 4, or more bytes, depending on the unicode character. As you can
>
> imagine this sort of encoding scheme would be very slow to do slicing
>
> with (looking up a character at a certain position). Python uses
>
> fixed-width encoding schemes, so they preserve the O(n) lookup speeds,
>
> but python will use 1, 2, or 4 bytes per every character in the string,
>
> depending on what is needed. Just in case the OP might have
>
> misunderstood what you are saying.
>
>
>
> jmf sees the case where a string is promoted from one width to another,
>
> and thinks that the brief slowdown in string operations to accomplish
>
> this is a problem. In reality I have never seen anyone use the types of
>
> string operations his pseudo benchmarks use, and in general Python 3's
>
> string behavior is pretty fast. And apparently much more correct than
>
> if jmf's ideas of unicode were implemented.
------
Sorry, you are not understanding Unicode. What is a Unicode
Transformation Format (UTF), what is the goal of a UTF and
why it is important for an implementation to work with a UTF.
Short example. Writing an editor with something like the
FSR is simply impossible (properly).
jmf
More information about the Python-list
mailing list