python 2.7 and unicode (one more time)

Marko Rauhamaa marko at pacujo.net
Thu Nov 20 19:56:18 CET 2014


Michael Torrie <torriem at gmail.com>:

> Unicode can only be encoded to bytes.
> Bytes can only be decoded to unicode.

I don't really like it how Unicode is equated with text, or even
character strings.

There's barely any difference between the truth value of these
statements:

   Python strings are ASCII.

   Python strings are Latin-1.

   Python strings are Unicode.

Each of those statements is true as long as you stay within the
respective character sets, and cease to be true when your text contains
characters outside the character sets.

Now, it is true that Python currently limits itself to the 1,114,112
Unicode code points. And it likely won't adopt more characters unless
Unicode does it first. However, text is something more lofty and
abstract than a sequence of Unicode code points.

We shouldn't call strings Unicode any more than we call numbers IEEE or
times ISO.


Marko



More information about the Python-list mailing list