
On 6/1/2011 12:34 PM, Bill Janssen wrote:
IMO, the thing that bit us on the fundament with the 2.x str/unicode divide, and continues to bite us with the 3.x str/bytes divide is that we don't carry the encoding as part of the 2.x 'str' value (or as part of the 3.x 'bytes' value). The key here is to store the encoding internally in the string object, so that it's available to do automatic coercion when necessary, rather than *requiring* all coercions to be done manually by some program code.
Some time ago, I posted here a proposal to do just that -- add an encoding field to byte strings (or, I believe, add a new class). It was horribly shot down. Something like 'conceptually wrong, some bytes have 0 or multiple encodings, can just use an attribute or tuple, don't need it'. -- Terry Jan Reedy