[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Stephen J. Turnbull stephen at xemacs.org
Tue Jan 7 16:36:48 CET 2014


Daniel Holth writes:

 > Isn't it true that if you have bytes > 127 or surrogate escapes then
 > encoding to latin1 is no longer as fast as memcpy?

Be careful.  As phrased, the question makes no sense.  You don't "have
bytes" when you are encoding, you have characters.

If you mean "what happens when my str contains characters in the range
128-255?", the answer is encoding a str in 8-bit representation to
latin1 is effectively memcpy.  If you read in latin1, it's memcpy all
the way (unless you combine it with a non-latin1 string, in which case
you're in the cases below).

If you mean "what happens when my str contains characters in the range
> 255", you have to truncate 16-bit units to 8 bit units; no memcpy.

Surrogates require >= 16 bits; no memcpy.


More information about the Python-Dev mailing list