On 9/1/2011 2:15 AM, Stephen J. Turnbull wrote:
Glenn Linderman writes:

 > How many different iterators into the same text would be concurrently 
 > needed by an application?  And why?

A WYSIWYG editor for structured text (TeX, HTML) might want two (at
least), one for the "source" window and one for the "rendered" window.
One might want to save the state of the iterators (if that's possible)
and cache it as one moves the "window" forward to make short backward
motion fast, giving you two (or four, etc) more.

Sure.  But those are probably all the same type of iterators — probably (since they are WYSIWYG) dealing with multi-codepoint characters (Guido's recent definition of grapheme, which seems to subsume both grapheme clusters and composed characters).

Hence all of  them would be using/requiring the same sort of representation, index, analysis, or some combination of those.

 > Seems like if it is dealing with text at the level of grapheme
 > clusters, it needs that type of iterator.  Of course, if it does
 > I/O it needs codec access, but that is by nature sequential from
 > the starting point to the end point.

`save-region' ?  `save-text-remove-markup' ?

Yes, save-region sounds like exactly what I was speaking of.  save-text-remove-markup I would infer needs to process the text to remove the markup characters... since you used TeX and HTML as examples, markup is text, not binary (which would be a different problem).  Since the TeX and HTML markup is mostly ASCII, markup removal (or more likely, text extraction) could be performed via either a grapheme iterator, or a codepoint iterator, or even a code unit iterator.