On Thu, Oct 13, 2016 at 5:17 PM, Stephen J. Turnbull firstname.lastname@example.org wrote:
Chris Angelico writes:
I'm not sure what you mean by "strcmp-able"; do you mean that the lexical ordering of two Unicode strings is guaranteed to be the same as the byte-wise ordering of their UTF-8 encodings?
This is definitely not true for the Han characters. In Japanese, the most commonly used lexical ordering is based on the pronunciation, meaning that there are few characters (perhaps none) in common use that has a unique place in lexical ordering (most individual characters have multiple pronunciations, and even many whole personal names do).
Yeah, and even just with Latin-1 characters, you have (a) non-ASCII characters that sort between ASCII characters, and (b) characters that have different meanings in different languages, and should be sorted differently. So lexicographical ordering is impossible in a generic string sort.