<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#330033">
On 9/1/2011 2:15 AM, Stephen J. Turnbull wrote:
<blockquote cite="mid:87k49sk4hl.fsf@uwakimon.sk.tsukuba.ac.jp"
type="cite">
<pre wrap="">Glenn Linderman writes:
> How many different iterators into the same text would be concurrently
> needed by an application? And why?
A WYSIWYG editor for structured text (TeX, HTML) might want two (at
least), one for the "source" window and one for the "rendered" window.
One might want to save the state of the iterators (if that's possible)
and cache it as one moves the "window" forward to make short backward
motion fast, giving you two (or four, etc) more.</pre>
</blockquote>
<br>
Sure. But those are probably all the same type of iterators —
probably (since they are WYSIWYG) dealing with multi-codepoint
characters (Guido's recent definition of grapheme, which seems to
subsume both grapheme clusters and composed characters).<br>
<br>
Hence all of them would be using/requiring the same sort of
representation, index, analysis, or some combination of those.<br>
<br>
<blockquote cite="mid:87k49sk4hl.fsf@uwakimon.sk.tsukuba.ac.jp"
type="cite">
<pre wrap=""> > Seems like if it is dealing with text at the level of grapheme
> clusters, it needs that type of iterator. Of course, if it does
> I/O it needs codec access, but that is by nature sequential from
> the starting point to the end point.
`save-region' ? `save-text-remove-markup' ?</pre>
</blockquote>
<br>
Yes, save-region sounds like exactly what I was speaking of.
save-text-remove-markup I would infer needs to process the text to
remove the markup characters... since you used TeX and HTML as
examples, markup is text, not binary (which would be a different
problem). Since the TeX and HTML markup is mostly ASCII, markup
removal (or more likely, text extraction) could be performed via
either a grapheme iterator, or a codepoint iterator, or even a code
unit iterator.<br>
</body>
</html>