[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Wed Jan 8 14:56:37 CET 2014

On Tue, Jan 7, 2014 at 10:36 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Daniel Holth writes:
>
>  > Isn't it true that if you have bytes > 127 or surrogate escapes then
>  > encoding to latin1 is no longer as fast as memcpy?
>
> Be careful.  As phrased, the question makes no sense.  You don't "have
> bytes" when you are encoding, you have characters.
>
> If you mean "what happens when my str contains characters in the range
> 128-255?", the answer is encoding a str in 8-bit representation to
> latin1 is effectively memcpy.  If you read in latin1, it's memcpy all
> the way (unless you combine it with a non-latin1 string, in which case
> you're in the cases below).
>
> If you mean "what happens when my str contains characters in the range
>> 255", you have to truncate 16-bit units to 8 bit units; no memcpy.
>
> Surrogates require >= 16 bits; no memcpy.

That is neat.