[Python-Dev] Multilingual programming article on the Red Hat Developer blog

Stephen J. Turnbull stephen at xemacs.org
Thu Sep 18 06:57:40 CEST 2014

Steven D'Aprano writes:
 > On Wed, Sep 17, 2014 at 09:21:56AM +0900, Stephen J. Turnbull wrote:
 > > Guido's mantra is something like "Python's str doesn't contain
 > > characters or even code points[1], it contains code units."
 > But is that true?

It's not.  That's why I wrote the slightly pejorative "mantra" and
qualified it with "something like".  The precise statement is
"something like" the array property is more important than preserving
character boundaries, so slices etc are allowed to do unexpected or
even evil things in the presence of astral characters in UTF-16

 > I don't understand what you are trying to say here.

 > Nor am I sure what you are trying to say here either.

We can discuss this off-list if you would like.  The natives are
getting restless.

 > > non-characters.
 > Actually not quite. "Noncharacter"

Note the hyphen!  (Just kidding, I will avoid that terminology in the
future.  I knew, but forgot.)

 > > Characters are those code points that may be assigned
 > > an interpretation as a character, including undefined characters
 > > (private space and reserved).
 > So characters are code points which are characters, including undefined 
 > characters? :-)

No, there's a clear hierarchy here.

More information about the Python-Dev mailing list