[I18n-sig] How does Python Unicode treat surrogates?

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 26 Jun 2001 02:07:43 +0200


> 16-bit char, a high surrogate, a low surrogate, and another regular
> 16-bit char.  You're saying that u[0] should return the first
> character, u[1] the entire surrogate (so it would still be a 2-item
> string), u[2] I gues the empty string, and u[3] the final regular
> char.
> 
> IMO that would break an important invariant of string-like objects,
> namely that len(s[i]) == 1.

No, it wouldn't. s[1] would return a string containing 2 Py_UNICODE
values, but len(s[1]) would still be 1.

Regards,
Martin