[docs] [issue13997] Clearly explain the bare minimum Python 3 users should know about Unicode

Fri Feb 17 23:25:27 CET 2012

Terry J. Reedy <tjreedy at udel.edu> added the comment:

I agree with no new builtin and appreciate that being taken off the table.

I think the place is the Unicode How-to. I think that document should be renamed Encodings and Unicode How-to. The reasons are 1) one has to first understand the concept of encoding characters and text as numbers, and 2) this issue (and the python-ideas discussion) is not about Unicode, but about using pre- (and non-)Unicode encodings with Python3's bytes and string types, and how that differs in Python3 versus using Python2's unicode and string types. If only Unicode encodings were used, with utf-8 dominant on the Internet (and it is now most common for web pages), the problems of concern here would not exist.

Learning about Unicode would mean learning about code units versus codepoints, normal versus surrogate chars, BMP versus extended chars (all of which are non-issues in wide builds and Py 3.3), 256-char planes, BOMs, surrogates, normalization forms, and character properties. While sometimes useful, these subjects are not the issue here.

----------
nosy: +terry.reedy

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13997>
_______________________________________