[I18n-sig] How does Python Unicode treat surrogates?
Fredrik Lundh
fredrik@pythonware.com
Mon, 25 Jun 2001 23:43:34 +0200
Tim Peters wrote:
> "The right way" to solve the character (not binary blob) indexing problem is
> to add a search finger to the string, a pair mapping "the last" character
> index asked for to the address of the start of its encoding. Since string
> traversal generally moves ahead-- or back --just one character at a time,
> the point in the first paragraph assures that traversing a string with N
> characters, in whole, takes O(N) time overall. It's not as simple as base +
> offset, but requires no more than a few range compares (plus updating the
> finger) per indexing operation.
plus the time it takes to acquire and release a thread lock
for each character...
</F>