7 Jan
2014
7 Jan
'14
3:36 p.m.
Daniel Holth writes:
Isn't it true that if you have bytes > 127 or surrogate escapes then encoding to latin1 is no longer as fast as memcpy?
Be careful. As phrased, the question makes no sense. You don't "have bytes" when you are encoding, you have characters. If you mean "what happens when my str contains characters in the range 128-255?", the answer is encoding a str in 8-bit representation to latin1 is effectively memcpy. If you read in latin1, it's memcpy all the way (unless you combine it with a non-latin1 string, in which case you're in the cases below). If you mean "what happens when my str contains characters in the range
255", you have to truncate 16-bit units to 8 bit units; no memcpy.
Surrogates require >= 16 bits; no memcpy.