I think that maybe an important point is getting lost here. I could be wrong, but it seems that all of this emphasis on encodings is misplaced.
In practical applications that manipulate text, encodings creep up all the time. I remember a talk or message by Andy Robinson about the messiness of producing printed reports in Japanese for a large investment firm. Most off the issues that took his time had to do with encodings, if I recall correctly. (Andy, do you remember what I'm talking about? Do you have a URL?)
The truth of the matter is: the encoding of string objects is in the mind of the programmer. When I read a GIF file into a string object, the encoding is "binary goop".
IMHO, it's a mistake of history that you would even think it makes sense to read a GIF file into a "string" object and we should be trying to erase that mistake, as quickly as possible (which is admittedly not very quickly) not building more and more infrastructure around it. How can we make the transition to a "binary goops are not strings" world easiest?
I'm afraid that's a bigger issue than we can solve for Python 1.6. We're committed to by and large backwards compatibility while supporting Unicode -- the backwards compatibility with tons of extension module (many 3rd party) requires that we deal with 8-bit strings in basically the same way as we did before.
The moral of all this? 8-bit strings are not going away.
If that is a statement of your long term vision, then I think that it is very unfortunate. Treating string literals as if they were isomorphic with byte arrays was probably the right thing in 1991 but it won't be in 2005.
I think you're a tad too optimistic about the evolution speed of software (Windows 2000 *still* has to support DOS programs), but I see your point. As I stated in another message, in Python 3000 we'll have to consider a more Java-esque solution: *character* strings are Unicode, and for bytes we have (mutable!) byte arras. Certainly 8-bit bytes as the smallest storage unit aren't going away.
It doesn't meet the definition of string used in the Unicode spec., nor in XML, nor in Java, nor at the W3C nor in most other up and coming specifications.
OK, so that's a good indication of where you're coming from. Maybe you should spend a little more time in the trenches and a little less in standards bodies. Standards are good, but sometimes disconnected from reality (remember ISO networking? :-).
From the W3C site:
""While ISO-2022-JP is not sufficient for every ISO10646 document, it is the case that ISO10646 is a sufficient document character set for any entity encoded with ISO-2022-JP.""
And this is exactly why encodings will remain important: entities encoded in ISO-2022-JP have no compelling reason to be recoded permanently into ISO10646, and there are lots of forces that make it convenient to keep it encoded in ISO-2022-JP (like existing tools).
I know that document well.
--Guido van Rossum (home page: http://www.python.org/%7Eguido/)