[Python-Dev] thoughts on the bytes/string discussion

Thu Jun 24 19:38:19 CEST 2010

Here are a couple of ideas I'm taking away from the bytes/string
discussion.

First, it would probably be a good idea to have a String ABC.

Secondly, maybe the string situation in 2.x wasn't as broken as we
thought it was.  In particular, those who deal with lots of encoded
strings seemed to find it handy, and miss it in 3.x.  Perhaps strings
are more like numbers than we think.  We have separate types for int,
float, Decimal, etc.  But they're all numbers, and they all
cross-operate.  In 2.x, it seems there were two missing features: no
encoding attribute on str, which should have been there and should have
been required, and the default encoding being "ASCII" (I can't tell you
how many times I've had to fix that issue when a non-ASCII encoded str
was passed to some output function).

So maybe having a second string type in 3.x that consists of an encoded
sequence of bytes plus the encoding, call it "estr", wouldn't have been
a bad idea.  It would probably have made sense to have estr cooperate
with the str type, in the same way that two different kinds of numbers
cooperate, "promoting" the result of an operation only when necessary.
This would automatically achieve the kind of polymorphic functionality
that Guido is suggesting, but without losing the ability to do

  x = e(ASCII)"bar"
  a = ''.join("foo", x)

(or whatever the syntax for such an encoded string literal would be --
I'm not claiming this is a good one) which presume would bind "a" to a
Unicode string "foobar" -- have to work out what gets promoted to what.

The language moratorium kind of makes this all theoretical, but building
a String ABC still would be a good start, and presumably isn't forbidden
by the moratorium.

Bill