python 2.7 and unicode (one more time)

Marko Rauhamaa marko at
Mon Nov 24 07:57:33 CET 2014

Gregory Ewing <greg.ewing at>:
> Marko Rauhamaa wrote:
>> Unicode strings is not wrong but the technical emphasis on Unicode is as
>> strange as a "tire car" or "rectangular door" when "car" and "door" are
>> what you usually mean.
> The reason Unicode gets emphasised so much is that until relatively
> recently, it *wasn't* what "string" usually meant in Python.
> When Python 3 has been around for as long as Python 2 was, things may
> change.

Yes, people call strings "Unicdoe strings" because Python2 *did have*
unicode strings separate from regular strings:

    Python2            Python3
    string             bytes (byte string)
    unicode string     string

In Python2 days, Unicode was a fancy, exotic datatype for the
connoisseurs. The rest used strings. Python3 supposedly elevates Unicode
to boring normalcy. Now it's bytes that have fallen into (unmerited)

But old habits die hard; you call cars "automobile cars" instead of
"cars" since, after all, "cars" were always pulled by horses...


PS Maybe interestingly, Guile went through an analogous transition. As
of Guile 2.0,

  a character is anything in the Unicode Character Database.
  Strings are fixed-length sequences of characters.
  A bytevector is a raw bit string.


However, Guile 1.8 still had:

  The Guile implementation of character sets currently deals only with
  8-bit characters.


and there were no bytevectors.

More information about the Python-list mailing list