[Python-Dev] PEP 393 Summer of Code Project

Thu Sep 1 06:40:55 CEST 2011

On 8/31/2011 5:58 PM, Neil Hodgson wrote:
> Glenn Linderman:
>
>> That said, regexp, or some sort of cursor on a string, might be a workable
>> solution.  Will it have adequate performance?  Perhaps, at least for some
>> applications.  Will it be as conceptually simple as indexing an array of
>> graphemes?  No.  Will it ever reach the efficiency of indexing an array of
>> graphemes? No.  Does that matter? Depends on the application.
>     Using an iterator for cluster access is a common technique
> currently. For example, with the Pango text layout and drawing
> library, you may create a PangoLayoutIter over a text layout object
> (which contains a UTF-8 string along with formatting information) and
> iterate by clusters by calling pango_layout_iter_next_cluster. Direct
> access to clusters by index is not as useful in this domain as access
> by pixel positions - for example to examine the portion of a layout
> visible in a window.
>
>     http://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-get-iter
>     In this API, 'index' is used to refer to a byte index into UTF-8,
> not a character or cluster index.

I agree that different applications may have different needs for 
different types of indexes to various starting points in a large 
string.  Where a custom index is required, a standard index may not be 
needed.

>     One of the benefits of iterator access to text is that many
> different iterators can be built without burdening the implementation
> object with extra memory costs as would be likely with techniques that
> build indexes into the representation.

How many different iterators into the same text would be concurrently 
needed by an application?  And why?  Seems like if it is dealing with 
text at the level of grapheme clusters, it needs that type of iterator.  
Of course, if it does I/O it needs codec access, but that is by nature 
sequential from the starting point to the end point.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/ef69cd9c/attachment.html>