[Python-Dev] Divorcing str and unicode (no more implicit conversions).
Phillip J. Eby
pje at telecommunity.com
Mon Oct 24 04:23:40 CEST 2005
At 06:06 PM 10/23/2005 -0700, Guido van Rossum wrote:
>Folks, please focus on what Python 3000 should do.
>
>I'm thinking about making all character strings Unicode (possibly with
>different internal representations a la NSString in Apple's Objective
>C) and introduce a separate mutable bytes array data type. But I could
>use some validation or feedback on this idea from actual
>practitioners.
+1. Chandler has been going through quite an upheaval to get its unicode
handling together. Having a bytes type would be great, as long as there
was support for files and sockets to produce bytes instead of strings
(unless an encoding was specified).
I'm tempted to say it would be even better if there was a command line
option that could be used to force all binary opens to result in bytes, and
require all text opens to specify an encoding. The Chandler i18n project
lead would jump for joy if we had a way to keep "legacy" strings out of the
system, apart from ASCII string constants found in code.
It would then be okay not to drop support for the implicit conversions; if
you can't get strings on input, then conversion's not really an issue.
Anyway, I think all of the things I'd like to see can be done without
breakage in 2.5. For Chandler at least, we'd be willing to go with a
command-line option that's more strict, in order to be able to ensure that
plugin developers can't accidentally put 8-bit strings in somewhere, just
by opening a file.
More information about the Python-Dev
mailing list