[Python-Dev] C-API status of Python 3?

Bill Janssen janssen at parc.com
Sun Mar 2 20:39:24 CET 2008


> Why not also make unicode() the default type constructor and only
> keep str() as alias to simplify porting (perhaps with a warning) ?
> 
> The term "string" is just too overloaded with all kinds of
> misinterpretations. The term "string" just refers to a string of
> bytes - a variable length array so to speak. However, depending
> on the application space, "string" is used as synonym for
> "text string" just as well as "data string".
> 
> Removing the term "string" altogether would make it easier for
> people to understand that Py3k only has unicode (for text data)
> and bytes (for binary data).

I agree that "string" is very overloaded, but calling it "unicode" is
sort of like calling integers "int32" -- that is, you're talking about
the implementation rather than the type.  In most programming
languages that aren't at the machine level (like C is), "string"
really is a sequence of text characters, not a "string of bytes", and
that's probably the term that should be used for Python going forward,
despite the legacy issues it involves.

Personally, I feel that "string" (for text) and "bytes" (for binary
data represented as a sequence of bytes) are appropriate terms for
Python.  Keep "unicode" for a release or two as an alias for "string".
But isn't all this in a PEP somewhere already?

Bill


More information about the Python-Dev mailing list