
Here are a couple of ideas I'm taking away from the bytes/string discussion. First, it would probably be a good idea to have a String ABC. Secondly, maybe the string situation in 2.x wasn't as broken as we thought it was. In particular, those who deal with lots of encoded strings seemed to find it handy, and miss it in 3.x. Perhaps strings are more like numbers than we think. We have separate types for int, float, Decimal, etc. But they're all numbers, and they all cross-operate. In 2.x, it seems there were two missing features: no encoding attribute on str, which should have been there and should have been required, and the default encoding being "ASCII" (I can't tell you how many times I've had to fix that issue when a non-ASCII encoded str was passed to some output function). So maybe having a second string type in 3.x that consists of an encoded sequence of bytes plus the encoding, call it "estr", wouldn't have been a bad idea. It would probably have made sense to have estr cooperate with the str type, in the same way that two different kinds of numbers cooperate, "promoting" the result of an operation only when necessary. This would automatically achieve the kind of polymorphic functionality that Guido is suggesting, but without losing the ability to do x = e(ASCII)"bar" a = ''.join("foo", x) (or whatever the syntax for such an encoded string literal would be -- I'm not claiming this is a good one) which presume would bind "a" to a Unicode string "foobar" -- have to work out what gets promoted to what. The language moratorium kind of makes this all theoretical, but building a String ABC still would be a good start, and presumably isn't forbidden by the moratorium. Bill