
On Tue, Jun 3, 2014 at 7:32 PM, Chris Angelico rosuav@gmail.com wrote:
On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano steve@pearwood.info wrote:
Having a build-time option to restrict all strings to ASCII-only.
(I think what they mean by that is that strings will be like Python 2 strings, ASCII-plus-arbitrary-bytes, not actually ASCII.)
What I was actually suggesting along those lines was that the str type still be notionally a Unicode string, but that any codepoints >127 would either raise an exception or blow an assertion, and all the code to handle multibyte representations would be compiled out.
That would be a pretty lousy option.
So there'd
still be a difference between strings of text and streams of bytes, but all encoding and decoding to/from ASCII-compatible encodings would just point to the same bytes in RAM.
I suppose this is why you propose to reject 128-255?
Risk: Someone would implement that with assertions, then compile with assertions disabled, test only with ASCII, and have lurking bugs.
Never mind disabling assertions -- even with enabled assertions you'd have to expect most Python programs to fail with non-ASCII input.
Then again the UTF-8 option would be pretty devastating too for anything manipulating strings (especially since many Python APIs are defined using indexes, e.g. the re module).
Why not support variable-width strings like CPython 3.4?