<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#330033">

    On 8/31/2011 5:58 PM, Neil Hodgson wrote:

    <blockquote

cite="mid:CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com"

      type="cite">

      <pre wrap="">Glenn Linderman:

</pre>

      <blockquote type="cite">

        <pre wrap="">That said, regexp, or some sort of cursor on a string, might be a workable

solution.  Will it have adequate performance?  Perhaps, at least for some

applications.  Will it be as conceptually simple as indexing an array of

graphemes?  No.  Will it ever reach the efficiency of indexing an array of

graphemes? No.  Does that matter? Depends on the application.

</pre>

      </blockquote>

      <pre wrap="">

   Using an iterator for cluster access is a common technique

currently. For example, with the Pango text layout and drawing

library, you may create a PangoLayoutIter over a text layout object

(which contains a UTF-8 string along with formatting information) and

iterate by clusters by calling pango_layout_iter_next_cluster. Direct

access to clusters by index is not as useful in this domain as access

by pixel positions - for example to examine the portion of a layout

visible in a window.

   <a class="moz-txt-link-freetext" href="http://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-get-iter">http://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-get-iter</a>

   In this API, 'index' is used to refer to a byte index into UTF-8,

not a character or cluster index.</pre>

    </blockquote>

    <br>

    I agree that different applications may have different needs for

    different types of indexes to various starting points in a large

    string.  Where a custom index is required, a standard index may not

    be needed.<br>

    <br>

    <blockquote

cite="mid:CAMLCkUeqSVt7LirPEvj_=nZp7nwb9uS8z4ba7LK2dFHdmXrQhw@mail.gmail.com"

      type="cite">

      <pre wrap="">   One of the benefits of iterator access to text is that many

different iterators can be built without burdening the implementation

object with extra memory costs as would be likely with techniques that

build indexes into the representation.

</pre>

    </blockquote>

    <br>

    How many different iterators into the same text would be

    concurrently needed by an application?  And why?  Seems like if it

    is dealing with text at the level of grapheme clusters, it needs

    that type of iterator.  Of course, if it does I/O it needs codec

    access, but that is by nature sequential from the starting point to

    the end point.<br>

  </body>

</html>