[Python-Dev] PEP 393 Summer of Code Project
Glenn Linderman
v+python at g.nevcal.com
Thu Sep 1 11:20:59 CEST 2011
On 9/1/2011 2:15 AM, Stephen J. Turnbull wrote:
> Glenn Linderman writes:
>
> > How many different iterators into the same text would be concurrently
> > needed by an application? And why?
>
> A WYSIWYG editor for structured text (TeX, HTML) might want two (at
> least), one for the "source" window and one for the "rendered" window.
> One might want to save the state of the iterators (if that's possible)
> and cache it as one moves the "window" forward to make short backward
> motion fast, giving you two (or four, etc) more.
Sure. But those are probably all the same type of iterators — probably
(since they are WYSIWYG) dealing with multi-codepoint characters
(Guido's recent definition of grapheme, which seems to subsume both
grapheme clusters and composed characters).
Hence all of them would be using/requiring the same sort of
representation, index, analysis, or some combination of those.
> > Seems like if it is dealing with text at the level of grapheme
> > clusters, it needs that type of iterator. Of course, if it does
> > I/O it needs codec access, but that is by nature sequential from
> > the starting point to the end point.
>
> `save-region' ? `save-text-remove-markup' ?
Yes, save-region sounds like exactly what I was speaking of.
save-text-remove-markup I would infer needs to process the text to
remove the markup characters... since you used TeX and HTML as examples,
markup is text, not binary (which would be a different problem). Since
the TeX and HTML markup is mostly ASCII, markup removal (or more likely,
text extraction) could be performed via either a grapheme iterator, or a
codepoint iterator, or even a code unit iterator.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110901/95124888/attachment.html>
More information about the Python-Dev
mailing list