
On Sat, 27 Aug 2011 12:17:18 +1200 Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Paul Moore wrote:
IronPython and Jython can retain UTF-16 as their native form if that makes interop cleaner, but in doing so they need to ensure that basic operations like indexing and len work in terms of code points, not code units, if they are to conform. ... They lose the O(1) guarantee, but that's easily defensible as a tradeoff to conform to underlying runtime semantics.
I would only agree as long as it wasn't too much worse than O(1). O(log n) might be all right, but O(n) would be unacceptable, I think.
It also depends a lot on *actual* measured performance. As someone mentioned in the tracker, the index you use on a string usually comes from a previous string operation (like a search), perhaps with a small offset. So a caching scheme may actually give very good results with a rather small overhead (you could cache, say, the 4 most recent indices and choose the nearest when an indexing operation is done; with utf-8, scanning backward and forward is equally simple). Regards Antoine.