[GvR, on string.encoding ]
Marc-Andre took this idea a bit further, but I think it's not practical given the current implementation: there are too many places where the C code would have to be changed in order to propagate the string encoding information,
I may miss something, but the encoding attr just travels with the string object, no? Like I said in my reply to MAL, I think it's undesirable to do *anything* with the encoding attr if not in combination with a unicode string.
and there are too many sources of strings with unknown encodings to make it very useful.
That's why the default encoding must be settable as well, as Fredrik suggested.
Plus, it would slow down 8-bit string ops.
Not if you ignore it most of the time, and just pass it along when concatenating.
I have a better idea: rather than carrying around 8-bit strings with an encoding, use Unicode literals in your source code.
Explain that to newbies... I guess is that they will want simple 8 bit strings in their native encoding. Dunno.
If the source encoding is known, these will be converted using the appropriate codec.
If you object to having to write u"..." all the time, we could say that "..." is a Unicode literal if it contains any characters with the top bit on (of course the source file encoding would be used just like for u"...").
Only if "\377" would still yield an 8-bit string, for binary goop...