
12 Oct
2016
12 Oct
'16
3:25 p.m.
On 10/12/2016 5:57 PM, Elliot Gorokhovsky wrote:
On Wed, Oct 12, 2016 at 3:51 PM Nathaniel Smith <njs@pobox.com mailto:njs@pobox.com> wrote:
But this isn't relevant to Python's str, because Python's str never uses UTF-8.
Really? I thought in python 3, strings are all unicode...
They are ...
so what encoding do they use, then?
Since 3.3, essentially ascii, latin1, utf-16 without surrogates (ucs2), or utf-32, depending on the hightest codepoint. This is the 'kind' field. If we go this route, I suspect that optimizing string sorting will take some experimentation. If the initial item is str, it might be worthwhile to record the highest 'kind' during the type scan, so that strncmp can be used if all are ascii or latin-1.
--
Terry Jan Reedy