[Tutor] sort() method and non-ASCII
boB Stepp
robertvstepp at gmail.com
Sun Feb 5 23:27:41 EST 2017
On Sun, Feb 5, 2017 at 7:23 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sun, Feb 05, 2017 at 04:31:43PM -0600, boB Stepp wrote:
>> On Sat, Feb 4, 2017 at 10:50 PM, Random832 <random832 at fastmail.com> wrote:
>> > On Sat, Feb 4, 2017, at 22:52, boB Stepp wrote:
> Alternatively, you can embed it right in the string. For code points
> between U+0000 and U+FFFF, use the \u escape, and for the rest, use \U
> escapes:
>
> py> 'pi = \u03C0' # requires exactly four hex digits
> 'pi = π'
>
> py> 'pi = \U000003C0' # requires exactly eight hex digits
> 'pi = π'
>
>
> Lastly, you can use the code point's name:
>
> py> 'pi = \N{GREEK SMALL LETTER PI}'
> 'pi = π'
You have surprised me here by using single quotes to enclose the
entire assignment statements. I thought this would throw a syntax
error, but it works just like you show. What is going on here?
>
> One last comment: Random832 said:
>
> "Python 3 strings are unicode-unicode, not UTF-8."
If I recall what I originally wrote (and intended) I was merely
indicating I was happy with Python 3's default UTF-8 encoding. I do
not know enough to know what these other UTF encodings offer.
> To be pedantic, Unicode strings are sequences of abstract code points
> ("characters"). UTF-8 is a particular concrete implementation that is
> used to store or transmit such code strings. Here are examples of three
> possible encoding forms for the string 'πz':
>
> UTF-16: either two, or four, bytes per character: 03C0 007A
>
> UTF-32: exactly four bytes per character: 000003C0 0000007A
>
> UTF-8: between one and four bytes per character: CF80 7A
I have not tallied up how many code points are actually assigned to
characters. Does UTF-8 encoding currently cover all of them? If yes,
why is there a need for other encodings? Or by saying:
> (UTF-16 and UTF-32 are hardware-dependent, and the byte order could be
> reversed, e.g. C003 7A00. UTF-8 is not.)
do you mean that some hardware configurations require UTF-16 or UTF-32?
Thank you (and the others in this thread) for taking the time to
clarify these matters.
--
boB
More information about the Tutor
mailing list