Re: [Python-Dev] PEP 393 Summer of Code Project

Aug. 26, 2011

      On Sat, 27 Aug 2011 12:17:18 +1200
Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
...
Paul Moore wrote:
...
IronPython and Jython can retain UTF-16 as their native form if that
makes interop cleaner, but in doing so they need to ensure that basic
operations like indexing and len work in terms of code points, not
code units, if they are to conform. ... They lose the O(1)
guarantee, but that's easily defensible as a tradeoff to conform to
underlying runtime semantics.
I would only agree as long as it wasn't too much worse
than O(1). O(log n) might be all right, but O(n) would be
unacceptable, I think.
It also depends a lot on *actual* measured performance. As someone
mentioned in the tracker, the index you use on a string usually comes
from a previous string operation (like a search), perhaps with a small
offset. So a caching scheme may actually give very good results with a
rather small overhead (you could cache, say, the 4 most recent indices
and choose the nearest when an indexing operation is done; with utf-8,
scanning backward and forward is equally simple).

Regards

Antoine.

Re: [Python-Dev] PEP 393 Summer of Code Project

Antoine Pitrou