[docs] [issue21667] Clarify status of O(1) indexing semantics of str objects

Thu Jun 5 14:34:08 CEST 2014

Nick Coghlan added the comment:

If someone doesn't understand what "Unicode code point" means, that's going to be the least of their problems when it comes to implementing a conformant Python implementation. We could link to http://unicode.org/glossary/#code_point, but that doesn't really add much beyond "value from 0 to 0x10FFFF". If you try to dive into the formal Unicode spec instead, you end up in a twisty maze of definitions of things that are all closely related, but generally not the same thing (code positions, code units, code spaces, abstract characters, glyphs, graphemes, etc).

The main advantage of using the more formal "code point" over the informal "character" is that it discourages people from assuming they know what they are (with the usual mistaken assumption being that Unicode code points correspond directly to glyphs the way ASCII and Extended ASCII printable characters correspond to their glyphs). The rest of the paragraph then provides the mechanical details of the meaningful interpretations of them in Python (as length 1 strings and as numbers in a particular range) and the operations for translating between those two formats (chr and ord).

Fair point about the slicing - it may be better to just talk about indexing.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21667>
_______________________________________