converting text and spans to an ElementTree

attn.steven.kuo at gmail.com attn.steven.kuo at gmail.com
Tue May 22 15:22:07 EDT 2007


On May 21, 11:02 pm, Steven Bethard <steven.beth... at gmail.com> wrote:
> I have some text and a list of Element objects and their offsets, e.g.::
>
>      >>> text = 'aaa aaa aaabbb bbbaaa'
>      >>> spans = [
>      ...     (etree.Element('a'), 0, 21),
>      ...     (etree.Element('b'), 11, 18),
>      ...     (etree.Element('c'), 18, 18),
>      ... ]
>
> I'd like to produce the corresponding ElementTree. So I want to write a
> get_tree() function that works like::
>
>      >>> tree = get_tree(text, spans)
>      >>> etree.tostring(tree)
>      '<a>aaa aaa aaa<b>bbb bbb<c /></b>aaa</a>'
>
> Perhaps I just need some more sleep, but I can't see an obvious way to
> do this. Any suggestions?
>


It seems you're looking to construct an Interval Tree:

    http://en.wikipedia.org/wiki/Interval_tree

--
Hope this helps,
Steven




More information about the Python-list mailing list