
On Wed, Jun 1, 2011 at 2:16 AM, Bill Janssen <janssen@parc.com> wrote:
I like the deprecations you suggest, but I'd prefer to see a more general solution: the 'str' type extended so that it had two possible representations for strings, the current format and an "encoded" format, which would be kept as an array of bytes plus an encoding. It would transcode only as necessary -- for example, the 're' module might require the current Unicode encoding. An explicit method would be added to allow the user to force transcoding.
This would complicate life at the C level, to be sure. Though, perhaps not so much, given the proper macrology.
See PEP 393 - it is basically this idea (although the encodings are fixed for the various sizes rather than allowing arbitrary encodings in the 8-bit internal format). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia