[Python-Dev] len(chr(i)) = 2?

Stephen J. Turnbull stephen at xemacs.org
Fri Nov 26 04:02:09 CET 2010


M.-A. Lemburg writes:

 > Please note that we can only provide one way of string indexing
 > in Python using the standard s[1] notation and since we don't
 > want that operation to be fast and no more than O(1), using the
 > code units as items is the only reasonable way to implement it.

AFAICT, the "we" that wants "no more than O(1)" does not include Glyph
Lefkowitz, James Knight, and Greg Ewing.  Greg even said that in
designing a UTF-8 string type he might not provide a indexing
operation at all.  (Caution: That may not be what he meant; I'm just
reporting the way I interpreted it.)  Of course none of them are
proposing to change Python, that's all in the context of designing a
new language.  But it does suggest that a lot of people can't think of
use cases where O(1) string indexing is more important than Unicode
robustness.

 > It is by far more important to maintain round-trip safety for
 > Unicode data, than getting every bit of code work correctly
 > with surrogates (often, there won't be a single correct way).

But surely it's more important than that to ensure that surrogates
can't crash a Python process with unexpect UnicodeErrors?



More information about the Python-Dev mailing list