
Nick Coghlan <ncoghlan@gmail.com> wrote:
Perhaps it is time to resurrect the idea of an explicit 'ascii' type? Add a'' literals, support the full string API as well as the bytes API, deprecate all string APIs on bytes and bytearray objects. The other thing I have learned in trying to deal with some of these issues is that ASCII-encoded text really *is* special, compared to all other encodings, due to its widespread use in a multitude of networking protocols and other formats.
I like the deprecations you suggest, but I'd prefer to see a more general solution: the 'str' type extended so that it had two possible representations for strings, the current format and an "encoded" format, which would be kept as an array of bytes plus an encoding. It would transcode only as necessary -- for example, the 're' module might require the current Unicode encoding. An explicit method would be added to allow the user to force transcoding. This would complicate life at the C level, to be sure. Though, perhaps not so much, given the proper macrology. Bill