
This is the exact reason that Unicode should be used for all string literals: from a language design perspective I don't understand the rationale for providing "traditional" and "unicode" string.
In Python 3000, you would have a point. In current Python, there simply are too many programs and extensions written in other languages that manipulating 8-bit strings to ignore their existence. We're trying to add Unicode support to Python 1.6 without breaking code that used to run under Python 1.5.x; practicalities just make it impossible to go with Unicode for everything. I think that if Python didn't have so many extension modules (many maintained by 3rd party modules) it would be a lot easier to switch to Unicode for all strings (I think JavaScript has done this). In Python 3000, we'll have to seriously consider having separate character string and byte array objects, along the lines of Java's model. Note that I say "seriously consider." We'll first have to see how well the current solution works *in practice*. There's time before we fix Py3k in stone. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)