flaming vs accuracy [was Re: Performance of int/long in Python 3]

Chris Angelico rosuav at gmail.com
Thu Mar 28 15:38:07 CET 2013

On Fri, Mar 29, 2013 at 1:12 AM, jmfauth <wxjmfauth at gmail.com> wrote:
> This flexible string representation is so absurd that not only
> "it" does not know you can not write Western European Languages
> with latin-1, "it" penalizes you by just attempting to optimize
> latin-1. Shown in my multiple examples.

PEP393 strings have two optimizations, or kinda three:

1a) ASCII-only strings
1b) Latin1-only strings
2) BMP-only strings
3) Everything else

Options 1a and 1b are almost identical - I'm not sure what the detail
is, but there's something flagging those strings that fit inside seven
bits. (Something to do with optimizing encodings later?) Both are
optimized down to a single byte per character.

Option 2 is optimized to two bytes per character.

Option 3 is stored in UTF-32.

Once again, jmf, you are forgetting that option 2 is a safe and
bug-free optimization.


More information about the Python-list mailing list