On Tue, 3 Jun 2014 22:23:07 -0700 Guido van Rossum email@example.com wrote:
Never mind disabling assertions -- even with enabled assertions you'd have to expect most Python programs to fail with non-ASCII input.
Then again the UTF-8 option would be pretty devastating too for anything manipulating strings (especially since many Python APIs are defined using indexes, e.g. the re module).
If the Unicode is slow (*), then obvious choice is not using Unicode when not needed. Too bad that's a bit hard in Python3, as it enforces Unicode everywhere, and dealing with efficient strings requires prefixing them with funny characters like "b", etc.
* If Unicode if slow because it causes heap to bloat and go swap, the choice is still the same.
Why not support variable-width strings like CPython 3.4?
Because, like good deal of community, we hope that Python4 will get back to reality, and strings will be efficient (both for processing and storage) by default, and niche and marginal "Unicode string" type will be used explicitly (using funny prefixes, etc.), only when really needed.
Ah, all these not so funny geek jokes about internals of language implementation, hope they didn't make somebody's day dull!
-- --Guido van Rossum (python.org/~guido)